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ON THE UNBIASED CHARACTER OF LIKELIHOOD-RATIO TESTS 
FOR INDEPENDENCE IN NORMAL SYSTEMS 

By Joseph F. Daly 


1. Introduction. In the statistical interpretation of experimental data, the 
basic assumption is, of course, that we are dealing with a sample from a statistical 
population, the elements of which are characterized by the values of a number of 
random variables • • • , x*. But in many cases we are in a position to assume 
even more, namely, that the population has an elementary probability law 
/(x\ • • • , , • • • , 6h)f where the functional form of /(x, 0) is definitely 

specified, although the parameters ‘ , (9^ are to be left free for the moment 
to have values corresponding to any point of a set 12 in an A-dimensional space. 

Under this assumption, the problem of obtaining from the data further infor- 
mation about the hypothetical distribution law/(x, B) is considerably simplified. 
For it is then equivalent to that of deciding whether or not the data support the 
hypothesis that the population values of the ^’s correspond to a point in a certain 
subset CO of fl. For example, we may have reason to believe that the population 
K has a distribution law of the form 


/(x\ X*; a\ a^y An , An , A22) = 


IM* 

2ir 


-I X 
e 


Here the set 12 is composed of all parameter points (a\ • • , ^ 4 . 22 ) for which the 
matrix H A,y || (t, j = 1, 2) is positive definite and for which — 00 < a* < 00 . 
We may wish to decide, on the basis of N independent observations (xi , x*) 
drawn from Ky whether An has the value zero for the population in question, 
without concerning ourselves at all about the values of the remaining param- 
eters; in other words, we may wish to test the hypothesis H that the parameter 
point corresponding to K lies in that subset of 12 for which ili 2 = 0. One way to 
test this hypothesis is to select some (measurable) function g{x) whose value can 
be determined from the data, say 


flf(x) = 


E (x\ - - «*) 



E (xi - «‘)*T E (**« - as*)* ^ 


Now g(x) is itself a random variable, so that it has a distribution law of its own 
when its constituent a:’s are drawn from any particular population K. Suppose 
then we choose a set of values of g{x), say 8, such that the probability is only .05 
that g{x) will lie in the set S when the x’s are drawn independently from a 
population K for which the above hypothesis H is true. Ordinarily we would 

1 
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take jS to be of the form [ g{x) | > go , and the test would then reject H at the .06 
probability level if the computed value of g(x) came out too large. But for all 
that has been said so far, we are perfectly free to choose a different critical 
region S, and even a different function g(x). The essential elements of this 
t3rpe of test are then a critical region /S, a function of the data g, and a probability 
level €, such that the probability is € * .06, say, that g d S when H is true; in 
employing the test we reject H at the given probability level whenever the 
sample value of g falls in the critical region. 

By the very nature of the problem, any inferences we make from a sample are 
subject to possible error. In the kind of test under consideration, the only error 
we can commit, strictly speaking, is that of rejecting H when it is true (an error of 
Type I in the terminology of Nesrman and Pearson [9]). The risk of such an 
error is thus known in advance; for if we use the test consistently at, say, the .05 
level, we know that the probability is .05 that we shall be led to reject a given 
hypothesis when it is true. On the other hand, it is quite conceivable that the 
test may be even less likely to reject H when it is false, or more precisely, when 
the true ^^s correspond to a point of Q which is not in w. In tliis event the test is 
said to be biased. Let us make this term more definite by proposing the follow- 
ing definitions: 

Definition I. A test is said to be completely unbiased if it has the property 
that for any probability level € (0 < € < 1) probability of rejecting H is greater 
when the $*8 correspond to a point of Q -- ca than when they correspond to a point of w. 

Definition II. A test is said to be locally unbiased if the set Q contains a 
neighborhood U of w sueh that for any probability level €{0 < e < 1) the probalnlity 
of rejecting H is greater when the parameter values correspond to a point of U w 
than when they correspond to a point of w. 

It is the purpose of this paper to consider the question of bias in connection 
with the Neyman-Pearson method of likelihood ratios [8] as applied to the 
testing of what may well be called hypotheses of independence in multivariate 
normal populations. The likelihood ratio method is undoubtedly a very familiar 
one, since the vast majority of tests in present statistical practice are based on 
this method. But for the sake of completeness we shall outline it briefly. Let 
the distribution law of the population K be of the form/(a;\ • • . jX^;6i y • • • ,g*) 
where the g’s may correspond to any point in a set Q, and let the hypothesis H 
to be tested be that the actually belong to the subset co of Form the 
likelihood function 

N 

p > g) XI /(j'a > * • • > }^\ } • • • > g^) 

CK-l 

i.e., the elementary probability law of a sample of N elements drawn inde- 
pendently from jSl. Denote by the maximum of Pn for fixed x where the 
g's are allows to range over fl; and denote by P%{x) the corresponding maximum 
value when the g's are restricted to w. The test criterion is then 

_P^{x) 


X 



UKB&mbOI>-jBATIO TOSTO f ? 

Evid^tly X depends only on the observable quantities x, , and^bas l^e nu^ 

0 < X < 1, with a definite probability law depe^ing on that of the bacdc pc^nila* 
tion K. In this method the critical region S is taken to be 0 ^ X ^ X« , where 
X, is so chosen that the probability F{X < X,i is t when the parameters of K 
correspond to a point in u. (It may be noted here that in all the ca^ with 
which we shall have to deal the probability that X lies in 8 when H is true is 
independent of the particular values of the tf’s as long as they correspond to a 
point of w.) The reason for taking the critical region to be of the form 0 X ^ 
X, and not, say, xl < X < x7 or X, < X < 1 may become clearer when we examine 
the resulting tests for bias. 

The recent work of Neyman and Pearson [10] has led them to lay considerable 
stress on the importance of unbiased tests. And though their attention has been 
directed mainly to the broader outlines of the theory of testing hypotheses, 
they have stimulated other writers to study particular tests of great practical 
importance. P. C. Tang [11] has obtained the general sampling distribution of 

1 — for what we shall call the regression problem with one dependent variate, 

and has given tables for P{X < X,1 — essentially proving the unbiased character 
of the test — which should be extremely useful. His article also contains an 
excellent discussion of the manner in which this test is related to the well known 
tests of linear h 3 rpothe 8 es [7] and to the ordinary anal3mi8 of variance. P. L. 
Hsu [6] has shown that this same distribution is fundamental in the study of 
Hotelling’s generalized T test [6] (a special but important case of what we shall 
call the general regression problem), and has proved that (locally) this test is 
not only unbiased but “must powerful’’ in a certain sense. On the other hand, 
it is not true that all likelihood ratio tests are imbiased [2]. Consequently, the 
knowledge that in a rather wide class of problems which arise in normal sampling 
theory the method of likelihood ratios furnishes tests which are either locally or 
completely unbiased would seem to be of some value, even when the exact 
sampling distribution of the criterion is too complicated to tabulate. 

2. The regression problem with one dependmt variate. Suppose that y is 
known to be normally distributed about a linear function of the fixed variables 
x^, • •• , x', so that the family of populations under consideration is characterized 
by a distribution function of the form 


( 2 . 1 ) 


-jlj 


f(y 1 z, 6 , ff*) = (2r<r*)~*e 
where the set of admissible values of v* and the b’a is 

Q: 0 < ff* < «, — 00 < bi < «. 


Let H be the hypothesis that the point (v*, bi , • • • , br) lies in the subset of Q 
defined by 

fi>: bj+i “ bf4e 
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The likelihood ratio appropriate to testing the hypothesis H on the basis of N 
{N > r) independent oh^rvations drawn from such a population is then 



" / ' \* 

S (va — £ haij 

— 1 \ <-l / 




I 


with the understanding that the values of the fixed variables xi , • • • , x« asso- 
ciated with the a-th observation have been so chosen that the matrix |1 H ** 

^ . 11 

is positive definite. (The expression in the numerator is the mini- 

••■I 

AT / r \2 

mum of £ { !/« — £ ) for variations of the 6’s over Q, while the denomina- 

•-1 \ / 

tor contains the corresponding minimum for variations of the b’s over w). 

In order to show that the test is unbiased, we shall make use of the exact 
sampling distribution of the quantity 


€ = 1 - x«^ 


first published by P. C. Tang [11]. Writing ||il,A||for the inverse of the 
matrix | [ a'* 1 1 composed of the first q rows and columns of 1 1 o‘^ 1 1 , let us put 


~ z (o“ - 


Since the critical region 0 < X < X, corresponds to the region 1 — = f , < 

{ < 1, it can then be shown that the probability of rejecting H when the popula- 
tion parameters have specified values o-*, • • • , h' is expressed by the series 


( 2 . 2 ) 

where 


HO, Q = e"® 


S 7\ L mr-q) + v, 


B(«, v) 


r(u)T(v) 
r(« + v) 



dz. 


Now 0 is a positive definite quadratic form in the parameters • • • , f, so 
that it vanishes if and only if the hypothesis is true. And if 0 < « < 1, then 
I(G, f.) is a monotone increasing function of G. For by differentiating (2.2) 
we obtain 


(2.3) 


dG 


HG, {.) 




^ id 4 lB»(r - g) + . 4- 1, i(W - r)] 


Bli(r-g)+v, t(N-r)jn- 


And from a property of incomplete Beta functions, which we shall demonstrate 
in the next section, it follows that each term in the series (2.3) is positive. Ac- 
cordingly we have 



UXBUHOOI>^BAtlO tiSim '5 ' 

Thbordii I. The WeeiUhood ratio test for the hypotheeie ihai id a papyifd/vM 
type (2.1) certain of the regreeeion eoeffieiente are zero, i.e,, the hypoffteeU UuU y is 
independent of the fiaed variablee • • • , , i« completely unbiased. 

Wilks (16] has noted that the ordinary analysis of variance and covaruuice 
amounts essentially to testing hypotheses of this nature by means of the function 


f 


l-X*'" 
X*w • 


Consequently such tests are also completely unbiased, since the r^on of rejec- 
tion is then taken to be of the form f f • . 


3. An inequality relating to incomqtlete Beta functi<ms. 

B(«, v;t) =• *“"‘(1 — *)*~*ds 

Now, 



The integrated term on the right is non-positive, so that 


Let \is write 

(O^t^ 1 ). 


(3.1) B(u, » + 1; 0 < - B(« -I- 1, v; t) 

u 

in which the equality holds if and only if < ■= 0 or < = 1. Again, since 
s“(l - «)•"* -1- «“-‘(l - zT m z-^l - «)->, 

we have 

(3.2) B(w + 1, v) t) + B(u, » ■+• 1 ; 0 “ B(w» f » 0- 
Combining these results, we find that 

( 8 . 3 ) ujj-v ^ 

w 

with equality only when < *= 0 or ( = 1. Hence we have 
Lbhma 1: IfO < t < 1, then 

B(u+ l,v;t) B(«, v; t) 

B(w -1- 1, 0 B(tt, v) 


4. The nudtl^e corrdaticax coefficient Suppose the distribution law of the 
underlying population is known to be of the form 

(4.1) f(x\ . . . , I x‘t‘, . . * , x") - 

The indices appearing in this expression take the vidues i, j «■ 1, • • • , f and 
Pi q ■■ i 1, • • • , m. The summation convention of repeated indices will be 
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t) 


used, for example, 2 will be denoted by We shall also have occa- 

sion to use indices r, s with the range r, 8 = 1, • • • , m. The set of possible values 
of the a’s, B's, and C’s is 

Q: II Bi) II positive definite; — « <a* < «; — « <Cp< «. 


We shall consider the X test for the hypothesis H that x’ is independent of the 
remaining variables x*, • • • , x", i.e., that the parameters belong to that subset of 
0 defined by 

«: Bu = 0, (* = 2, . . . , 0; 0 \ = 0. 

N 

Let us write v" = £ (x« — J*')(xt — J*), and assume that the values of the 

o*«l 

fixed variables x. have been so selected that the matrix || || is positive defi- 

nite. The likelihood ratio can then be expressed in the form 


X * 



(1 - 


where »u is the complement of in the determinant | e” j . If W > m -b 1, 
the general sampling distribution of i?* (the multiple correlation coefficient 
between x* and m — \ other variates), for this case in which x*, • • • , x‘ are sub- 
ject to sampling variation and the remainder are fixed, is 


(4.2) 


F(R‘)d{R‘) m(N-m)] 

V V f - py(p^)'(fi^)'‘~^^rn^(Ar -i) + ^ + y] 

^■■0 I'—O M!vir[KAr - 1) -b p]r[i(m - i) + p + ^ ^ 


where 


1 


\Bii\ 




^ ri^ 




This distribution was first obtained by Wilks [13], although Fisher [3] had 
previously treated the two extreme cases in which (1) all independent variables 
are subject to sampling fluctuation, and (2) all independent variables are fixed. 

To simplify the presentation, let us put p = p’, g = iy* and Jl = and note 
that if * 0 if and only if Cj, = 0 (p = f + 1, • ■ • , m) while p — 0 if and only if 
Bj* s= 0 (A: = 2, • • • , <), so that if = p = 0 means that the hypothesis ff is true. 
On any alternative hypothesis, one or the other or both of these quantities will 
be positive. Ijet the r^on of rejection be taken to be 


< 5 < 1 , 


which corresponds to 


0 ^ X < (1 - A)*'. 



UK|!UaOOD*BAT10 TESTS TOB mOBPBimBNOB 
The inobabiHty rejeol^ U is then 


(4.4) 


m ft) - .-i s?;« - 


I — 1) + M + ^ \Qf — »»)] * 


We shall show that l{p, §, R,) is a strictly monotone increasing function of p 
for each S, and that 1(0, R,) is a strictly monotone increasing function of ff. 

bl 

First consider — . We can write (4.4) in the form 
op 


I(p, 




1 




^SiuiriKAr-i) + m 1 t=iv\ 


.2:^,(i-p)*<-«+Vm.m 


where 


„ _ rfifjv — + u + fl — 1) + A* + >'i — n»); 5i] 

V.., - VmN 1) + M + fl- _ 1)-+- + • 

Then, formally, 


(1 - - 1) + mV. 

b«M) Vl B -0 Vl 




Taking out the factor (1 — p) 

r-l 




H ~ £ ^ <pm.» ~ 2 — I [i(^ — 1) + fi]<pi,.w 

l*■■0 Vl ‘ 


, we have left 

•0 r 

-fl 

B-0 I'! 


00 w 


*= £ -, {<pi..»+i - - 1) + M + "]«»».»}• 

WomO Vl 


And the expression <p„,,+i — [K'^ — 1) + m + is the same as 

1 1 ~ 1) 4' M + y + 1, KW — m), fi,] 

\ B[J(m — 1) + M + •' + 1. — »»)] 

B[^(m — 1) + M + I*, ~ w»). ^tl\ 

B[i(m - 1) + M KA^ - m)] I 


r(KiV-l) + M + »' + 


and is therefore poritive, by Lemma 1. Consequently 

^ -f (Pt 9) ■^) ^ 0> 

with equality holding only ^ » 1, or if the critical r^on is taken as the whole 
interval or the null set. 
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We have yet to investigate rz J (0, jf, jfi,) . In this case (4.4) becomes 

ay 

(«) A) ■ 

(Note that this agrees with (2.2) if we make use of the relations r = m, g »= 1, 
and = 2<r*.) We then obtain 

~T(n a J5 ^ _ --S V i — 1) + M + 1» — »») ; ^«] 

B[i(m - 1) + M + 1, i(N - mf] “ 

_ B[K»» - 1) + M, i(N - m); fj.]\ 
B[J(m - 1) + M, KJV - m)] 7 

which the lemma shows to be positive when 0 < /i, < 1. 

This concludes the proof of 

Theorem II. If the underlying population has a distribution law of die form 
(4.1), then the likelihood ratio test for the hypothesis that is independent o/**, • • ■ , 
where • • • ,x”' are fixed and x*, ■ ■ ■ ,x‘ are subject to sampling variation, 
is completely unbiased. 

5. Mutual independence of several sets of random variables.' Let the dis- 
tribution law of the m-variate population be of the form 

/g I Bjf I* 


Here Q is the set |1 5i,- H positive definite; — « < o* < « . Suppose we wish to 
test the hypothesis Hi that the variates {**, • ■ • , a:"' }» • • • > • • • , a:"*') 

are mutually independent in sets [14], where 0 == jwo < TOi < • • • < Wp = m. 
Then the u set is that defined by 

II fi.7 II = II Bnn II + • • • + II II = II II + . . . + II 5, t| , 

that is, we have = 0 unless the indices i and j both relate to the same set of 
variates. 

Associated with the population of random samples On XN > m -f 1) drawn 
from a universe characterised by (6.1), we have the distribution function 

N 

I B 1*"^ - 2 *</<»'-«»<) 
P(*;5,a)=:'^e«-' 

The maximum of P with respect to variations of the parameters 8 ^ , a' in 0 is 




^ In this and in subsequent sections an index occurring both above and below indicates 
summation in accordance with the usual convention. 



i^ncsLiaooiKiuvio Tiwnt 


0 


wh«re 


v*^ » £ (*i — — s^). 

«*•! 


And the maxiinum when the parameters are restricted to » is 

Asm 


P« = (vi • • • 




where Vy. stands for the determinant of the v’s connected with the ^'^th set of x’s. 
Thus the appropriate likelihood-ratio is given by 


•v*/w 

A/ 


..<^1 


Vl 


Pp 


It is easy to see that the value of X/ is unaltered if we replace ** — a* by 
so that we can express the probability that X/ will lie between 0 and X, in the form 


Df/i r 


K 

- 2i Buzlxl 

e dal 


d®". 


Furthermore, X/ is invariant under the operation of replacing any x by a linear 
combination of x’s belongiitg to the same set. And since the assumption that 
II Bij 11 is positive definite implies that the matrices H || have the same 
property, we can transform the x’s in each set among themselves by orthogonal 
transformations in such a way as to reduce each of the expressions 




to sums of squares. Thus we have 
B*ifr 


(6.2) 

where 

(6.3) 

(8.4) 


KB, X.) = 


riW* 


r - S *i,*i*i 

/ c dx}..1dxj; = /(B* X.), 

''x<x. 


Bt„„ = 

Bh. = 0 


(^*1 j *(i j J»* I “ m,i_i H” 1 , • • ' , w,,) , 

*'• ^ h ( 


and the subscripts on the indices indicate the sets of values over which they 
range; e.g., i% runs over the numbers corresponding to the columns of the matrix 
II Bs II . From (5.3) and (5.4) it is clear that || B*j || reduces to a diagonal 
matrix when H is true. 

In order to show that the test is locally imbiased, we may consider the deriva- 
tives 
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for the B*’8 are linear functions of the B’s; and the positive definiteness of one 
matrix of second partials implies that of the other. We have at once 


/ dB*\ ^ \ „ 

unless the second derivative is taken twice with respect to the same B*. Thus 



t JL. 
-2~~ / 


- 2 


r’e 


dx, 


where the B* indicates that the B's have the diagonal form associated with H. 
And since whenever the point «!,••• ; xj ,•••, a;ir ;•••,*” is in the region 
X < X, , so also is the point x} , • • • ; — xi , • • • , —Xn ; • • • , x* it follows that 



/(B:,X.) 


= 0 , 


(m ^ f). 


Similar considerations show that the non-repeated second derivatives 


a* 




« /(B * , X,) = 




/ ( t (± xj'x^) 

Jx<x. \«-l / \fi-l / 


must vanish. 

Finally, we must show that the repeated second derivatives are positive when 
evaluated at a point in w, except of course in the trivial cases X, = 0, X, = 1, 
when they must be zero. In order to do this, we shall make use of the fact that 
the v'b which go to make up X have the Wishart distribution [17] 

(5.6) ^ — ^ ^ ^ 


(Because of the relation = v^', only §»»(to + 1) of the v’s appear as differen- 
tials). It will be useful to have the notation 

Di(w-i) 

0(B, JV - 1, w) = 

7(B, W - 1, w) - 


With the aid of (5.5) we shall now compute the moments 



I<aaiUH 06 D>BATZ 0 Turn VOtt iNDBPiamENOli 


for the ctuse in which the matrix \\ B{j || has tiie form 

-Bu • • • Biiii, 0 • • • OBim 

0 


( 6 . 6 ) 


Bmtl ••• Bm,M| 0 ••• 0 

0 ... 0 


• • 
0 

■BmlO ... 0 


IIS II 


:n 


where || S || stands for || || 4- * * ■ -j- il B,, || , and all other B’s, except tiiose 

indicated, are zero. Let us designate by (S) the set of a*' which correspond to 
the rows and columns of S, and by (v — S) the remaining v’s. We then remark 
that the result of integrating (5.5) with respect to the p’s in (p — 5) is to reduce 
it to the corresponding distribution for the variables in the set i>, thus: 


(5.7) 


0(B, JV - 1, m) f V(B, JV - 1, m) d(p - 5) 

= G(S, N-l,tn- miMS, 


where || Su || is i^he inverse of the matrix obtained by inverting || 1| , and 

striking out the first mi rows and columns, that is 


= B**, (k, I = mi + 1, - - • , m). 


Then, 


G(B, N - 1, 



V(B,N- l,m)d(v- ») 


can be written as 


(5.8) 


ol S’- - >+ «)/».-*••■ -i* 

X V(B,N-l+2h,m)d(p-S) 

- G(b!n-I'm - 1 + 2*. - "O 

X • • • v^V(S, W - 1 + 2A, m - mi). 


It can be seen from (5.6) that 

ll«ll-l|B.II+ + II B... II + 11 5. II 

since of all the rows and columns of || B,y || which are involved in || B || it is 
only the last in which a non aero element appears outside of the blocks || Bt {| , 
" ' I II By II . Consequently, the p’s corresponding to the determinants Pt t ••• , 
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Vp are independently distributed, so that if in (5.8) we int^pate out ail the 
remaining p's but these, we shall be left with a product of factors 

0{B,N-hm) flO(Bt,N-l+2h,kt) 

OiB, N- l + 2h,myh^ OiB,, N - 1, k,) 

X G(Bt, AT - 1, jfc,)pr*F(B„ JV - 1 + 2A, *,) 


X 


0(Bp, N -l+2h, kp) 
0(Sp,N-l,kp) 


.0(Bp,N 


1, kp)vtV{Bp, JV - 1 + 2A, kp), 


where stands for the order of | j B„ 1 1 . And this, when integrated with respect 
to the p's in Pi , • • • ,Vp, yields 

G{B, N - I, m) ff G{Bi ,N-\ + 2h,h) JV - 1 + 2A, kp) 

G{B, N - 1 -f2h, m ) • fi ■ G{B;;N - 1, k,) ^ " G(Bp, N - TW ’ 


which, because of the definition of the G’s, reduces to 


TT r[i(Ar — t) + A] TT r[^(iv — *)] 
fi r[^(N - i)] ■ U M mN -t) + h] 


XB-*Bi ... 


Denoting the product of ratios of F's by Kh , and recalling the form of || II » 
we therefore have 


(6.9) 

with 


r p* ■ 

= KkBIB'-'' 

• • • Vp^ 

Bn • * • 

Blm, 0 . . . OBln 


0 

Bmil • • • 

Bmjmi 0 • • . 0 

0 

• 

0 

• 

• 

^ I|5p11 

0 

JSiniO • • • 

0 


But it is not difficult to see that under the condition (5.6), the matrix || .§» || 
is also the inverse of the matrix obtained by striking out the first m rows and 
columns in the inverse of 1 1 B' 1 1 . Making use of this relation, we can apply the 
Jacobi theorem to (5.9), and put that expression in the form 


where H Bj || is the matrix in the upper left hand comer of H B' || , namely 
\\Bhh\\. 



uuiiraooD-RAvio Tnm f(« moaPBiro US 

Let the subscript 0 on a, B stead for the result of replacing bjr + 
. For sufficiently small values of the jS’s the matrix || But |1 ftiU be 
positive definite, so that we shall have 

which we can put in the form 

(6.10) K' f l 

Wilks [13] has shown how to generate moments of determinants by the device 
of replacing fii,/, by and integrating with respect to the f’s from 

— 00 to 00 . Applying this process 2h times to the left hand side of (5.10) gives 

/ (— ^ Y F(B^, dv, 

J \»1 • • . Vp/ 

which when multiplied by x~*‘*B*‘*““ 3 rield 8 

B[(X*'")*] 


when the /9’s are set equal to zero. 

To obtain the value of this expression, we may perform the same operations 
on the right hand side of (5.10). But before so doing, we shall put B^ in a 
more convenient form. We have 

D D S D* B B**** 

, Jjp « Dip* IJ — •'Oipf 

where B"" is the inverse element of B«« in jj B || , and Bi“ is the cofactor of 
Bii 4 in Bi/}, the result being obtained by expanding Bf according to minors of 
the first row and first column. Similarly, 


( 6 . 11 ) 

From (6.11) we have 


B-Bi.B-B!«.BB’"".B( 


/u 


B 

Bj • • • Bp 


>'U 


i-bJ,B~-.^, 


so that if we put B-BT^ • • • B7* « A, we find that 


Bfi * 



BLB-". 


ni 


K 

Bi 


Bi BS‘\ 

•b?*b^/ 
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Thus the result of multiplying (6.10) through by ” (where no 0’a are 
substituted in this determinant) can be put in the form 


(6.12) 






Expanding the expression in curled brackets, we get 




00 

s 

M-0 


r[i(iV — 1) + J'] or D-li(iV~l)+fc+rl 

^ir[«w - 1)1 ®“ 


(1 - A)’. 


If we let Bifit stand for the result of replacing JSn by Bu — I in Big , we can write 
this as 


(5.13) 


• KAT-l) ^ r[i(JV — 1) + v] KVta'^h-' D»(#-l>+>’ 

w r[|(iv — 1 ) + ft] ' d’ p-u(w-«+M 
rii(iV - i) + ft + r] a<' ^ 


the derivatives being evaluated at t = 0. 

Now Wilks’ results show that the operation of introducing /9<iJi + 

Bifii to replace diui and integrating with respect to the f’s, when repeated 2ft 
times on produces 


IT 


Wl* D— |(iV— 1) 

Du 


fV r[i(N - i)] 
LhvmN-'t^ + h] 


when the /S’s are finally set equal to zero. Reversing the order of summation, 
differentiation and integration in (5.13), we thus obtain 


...nk 


r[i(JV - 1 )] V rlKAT - i) + .-I 


(6.14) 


Now 


S t[UN - t) + ft] “ f=i - 1)] 

mm - 1) + ft] /3' B-Kiv-iA 
X (1 - A) (Bx ) Bx Bu 



vm - 1) + v] 

mm-i)] 




so that (5.14) becomes 

m(N-t)] r[i(Ar- i)] 

ii r[i(N - ») + ft] « k! mm- 1)1 

V./, m(N-l) + ft] r[i(N-l) + r] 
r[i(JV - 1 ) + ft + .-]' r[i(N - 1 )] • 
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From this it appears that the A>th moment of is given by 


(6.16) 


m/. */Ns*i _ ft - *) + ft ft rif (AT - m 

E[i\ ) ] - 11 .11 11 rtiCAT -i) + h] 

V ^ fi - \y — 1) + "I 


■ mN-i)+y] vm-D + h] 
T[^{N - ly + h + p]’ r[i(Ar- i)i : 

A considerable amount of cancellation will take place in (6.16), for tn is greater 
than any kt . Suppose the lai^est kt is kf . Then we can cancel its product 
into the first one, with the assurance that there will be at least one factor 

(fi lfl> 

to cancel the corresponding factor under the summation sign. Hence we have 

mrMH, _ tt r[§(iNr - i) + h] fV, fr,, rm - »)] 

mN-i)\ ‘ii M mN-i) + h] 


(6.17) 


F[(x‘'T]= n 

i—Ast'+l 


mN 


w V /I _ 4\»r[KAr — 1) + v] r[4(Ar — i) + r] 

^ ^ ’ r!r[KA^-i)] 'r[i(Ar - i) + a + „)» 

where U' indicates that t' has been omitted, and H" indicates that one factor 
(6.16) has been cancelled. Then we can take out the factor f = m in the first 
product, putting it under the sununation sign, where, together with the fiinal 
factor in each term of the sum, it gives rise to the combination 

r[J(Ar - 1) + .] r[i(iv - m) + A]r[K»» - D + W 

mN-m)]mm-l) + 'v\' T[m-l) + h + p] 

After making this reduction, we obtain 


(6.18) 


Eicx’")*]. if 


i) + A] fr, fV/, rlKAT - »•)] 
-i)] 74 M r[f(JV-t) + Al 


V ('1 —!) + »'] Bl\(N — m)+h, i(m — 1) + r] 

^ vinKAT- 1)] B[§(iV - m), Km - 1) + r] * 

The products of ratios in the first part of (6.18) are of the type discussed by 
Wilks in connection with integral equations of type B [12]. It follows from his 
results that X]'" is distributed like the product 

z-01 ... Om' (m' =« m — kf — 1), 

where s and the f’a are independently distributed, with the distribution of the 
^’s given by 
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where the 6< and Ci are constants which depend on N, m, and the sizes of the 
blocks, but not on A, and the distribution of z is given by 


F(z) = E 

pmmO 


.1 * Tim - 1 ) + -1 

vfriKJV - 1)] 'BIKJV - m), K«t - 1) + v] ■ 


Consequently, the probability that X lies between zero and X, is 


J(A, X,) = A*'^-*' / E 


(1 - A)’ 


- 1) + v] 
viriKAT-i)] 


xm 


m(N - m), f(m - 1) + r]***®®’ 


where the integral is to be extended over the region 

S: 0 < z-Oi ■■ ■ e„> < x?'", o<e<<i, o<z<i. 


Let us integrate first with respect to z and then with respect to the ff’s; we have 


(6.19) 


J(A, X.) - / 


Ed -A)' 


F-O 


nm - 1) + v] 
virSCAT -1)] 


X 


B'ftCAT - m), i(m - 1) + y; 
Bim - m), Um-- 1) + y] 


where St is the set Ilffi < X*'" ,0 < fft < 1, and 


(6.20) 


B'(u,v,<p) = r z'‘-\l -zY'^dz 
Jo 


= /" z’ ‘(1 — z)" * dz — B(d, m, 1 — (p), 


tp(0) being the upper limit for z for fixed 0. It is clear that the subset of Si for 
which <p(6) < 1 will not be of measure zero in the fi-space, since we assume that 
0 < X. < 1. 

The relation between (6.19) and the corresponding expression for the multiple 
correlation coefficient without fixed variates — the case § =* 0 in (4.4) — may be 
clearer if we put 

(5.21) p = 1 - A = 

where 5"*" is the inverse of Bmm in H 5 H , and 5“ is the inverse of fin in |1 5i || . 
Then the required probability of rejection when p has any fixed value is 


HP, 1 - xr'^) 



Kw-i) r[i(Af — 1) + s] 

nm-i)] 

y Bl4(»» “ 1) + »'> hiN — to), 
BIJ(to -l) + y, UN - 


1 - 

TO] 


^mds, 


where we have used the relation (6.20) between the incomplete Beta functions. 
Differentiating with respect to jS before performing the integration with respect 



i:2xia^ooi>>BAiio [‘h\' 

to the 9*8> we find by a computation aimikff to that in sec^n 4 that each term 
in the series is positive except where w(®) * 1 ; so that we have 


And by (6.21), we then have 


3^/ 


> 0 . 


Since the argument is clearly independent of which Bi^j, (m 9^ •>) we take, it 
follows that the test is locally unbiased. We have therefore proved: 

Theorem III. If x^, • • • , have the joint normal distribuHon (6.1), then the 
likelihood ratio teat for the hypothesis that the x’a are independent in sets is locally 
unbiased. 

In certain types of statistical material it may be important to consider, not 
the independence of the x’s themselves, but of their deviations from regression 
functions. For example, in the case of several related time series, it may be 
desirable to eliminate the trend of each x' by means of, say, a second degree 
pol 3 momial in t. Consider then in general a population whose distribution func- 
tion is of the form 




(m, »' = m + 1, • • • , m -t- g) 


with unknown Bn and Cl . The likelihood ratio for testing the hypothesis Ht 
that the sets of deviations 


Clxf‘, • . . , X- - C;'x^; • . • ; x"'-''"* - cr’'’'^V, 


- 


are independent is 


where 



= 2 ( x 1 - cyM - cix’a) 


and Cl is the usual least squares estimate of Cl , given by 

ClaT = 

with 

o” = Sx^x’a (r, 8 »= 1, . . . , m + g). 

An examination of the characteristic function of the shows that their 
distribution law is the sftme as that of the v*' of the preceding discussion, except 
for the fact that N — l^is replaced by iV — g. Consequwitly the above results 
on freedom from bias, and also those cd the next section, apply equidly well to 
the \i test for the independence of deviations from regression functions. 
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6. On tiM laoinMite of Although we have succeeded in proving the un- 
biased nature of the in'eceding test only in the local sense, we can show that the 
moments of the criterion have a property which seems very closely related to 
that of furnishing a completely unbiased test. For it can be shown that each 
of the quantities 

is greater, when Hiia true than when any alternative H' holds. It will perhaps 
be sufficient to prove this statement in detail for the case where h = 1 and 
where Hi is the hypothesis that the matrix 1 1 Bi,- 1 1 has the form 1 1 J^o 1 1 4- 1 it 1 1 • 

BuBu 

0 0 

Bn Bn 

BuBu 

0 0 

BitBu 

0 , 0 II B,.,. II I 

in the notation of the preceding section we then have 

= i,,j, = 3,4; it ,js = 5, ■ ■ ■ ,m. 

Even when H is not true we find that 


(6.r) E[|«*'|*|t»**''r] = 


G(B, N - I, m) 0(B, N - I + 2h,m- 
0(B, N — 1 + 2A, m) g(S, at — 1, m — 4) ’ 


where = B'*’*. Using the definition of the G’s in section 5 and the Jacobi 
theorem, we can write (6.1) in the form 

Bii r I r*] = KkB-^ 

where B is the determinant of the matrix composed of the first four rows and 
columns of || Bi,- 1| . In the general case we therefore have 


Bu Bu Bjt Bit 
Bn Bn Bn Bu 

ll^ll 

Bn Bn Bn Bu 
Bn Bo Bn Bu 

Thus if we set A « 1, and replace Bi„-, and B<,;, by B<,,-, + 

and Bi,/, -f respectively, indicating this replacement by a 

prime, we obtain 


( 6 . 2 ) 



UKEUKOOlMUnO TBSTS «« IND^mNOBNCB 1;!^ 

Treating B' as a bordered determinant, we can reduce it to ' 

B' = B(«)(l + bum® 

- Bw(l + B|{l5fiX)(l + 

- B„,(i + + Ba«)r)(i + Baa)f{W) 

= B(1 + B**'‘{if{lf)(l + BaCfirtyV)(l + 

where the subscripts on the B’s indicate the sets of (’s still contained in the 
determinants, and ||B'^|| = ||B{,'|r*> Similarly, 

(6.4) B' 

- 5(1 + + SJiS'sM’Xi + 5iliKM>)(i + SifeffO. 

the inverse now being taken with respect to jj B j]. 

But between, say, B'di) and Bas* , there is the relation 

(6.5) Baij = B\li^ - Bai5B(«)<.,.B(U5 , 

where 1| Bdjx,,, || = H Bd** IT*, that is, the invert of the matrix obtained by 
deleting the first four rows and columns of || Bdi) ||. Consequently 

Baa€J!’{jr < 

with equality holding only for those values of the {’s for which 

f<TBaa = 0 t, = 6, . . . , w. 

And this set of |’s will not make up the entire { space unless || Bj,- 1| = 
II B II 4- II Bi,/, ||. Applying the same kind of reasoning to the other quad- 
ratic forms in (6.4), we can therefore show that 

< B-‘ / (n- .. . (1 + d^. 

The last form can be reduced to a sum of squares with unit coefficients by a 
linear transformation of the thus 

( 6 . 6 ) 

sir'll Bilk |■'(l + i‘'" Sil’sS’)-*"'" • • • (1 + 

And by making use of the fact that 

• B(uj) « Bail) • I B(a) <,/, I , 
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we can express the right-hand side of (6.6) as 

s-‘ / fiu . 1 1-^(1 + ff . . . (1 + «• 

This in turn becomes [c. f. (6.4)] 

«■*/ 1 «(»»./. |■‘(l + 

X <1 + « 

= / 1 r(i + + S(i>' 

X (1 + «. 

At this stage we can write 

I 1 = 1 Sm,-. 1(1 + 

where 1| 5(iV’‘ 11 = || Baxui ir\ and apply the relation 

BVr' = 6\ii' - , II li = 11 ir‘. 

Therefore, 

unless = 0 (f» = 3, 4). We can thus continue as follows 

< 1 Bi,i, r/ (1 + 

X (1+ 2{.^”{.-r)~*"(i + 

Transforming the {^“'s, we get 

1 fii./, r / iBaV^’ 1"*(1 + 

X (1 + S€,”^{i;’)“*^(l + d{. 

Since 1 T* = 1 ■Bfoi,;, |, this becomes 

1 &,/. 1"* / (1 + + 2(i"{if)-“^+“ 

X (1 + 2(if t{f)"*''(l -b di 

= / (1 +2€i“0’‘''(l-b 2{i« «<"’)■*'''■*•” 
X (1 + 2{if «{r)'**(l + 2{i;’ jU))-»(^+i) ^ 
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n 


»+l) 


Collecting these results, we finally obtain 

i K. / (1 + + iCtS’)-"'" 

X (1 + di 

with equality only in case Hj is true. But the right side of (6.7) is the first- 
moment of computed under the h 3 rpothe 8 iB Hi , while the left side gives 
the corresponding moment in the general case. 

The possibility of carrying out this reduction for the case in which the matrix 
11 II has more than two blocks, or blocks of unequal size, seems suflSciently 
clear. And to obtain higher moments, we have only to introduce the proper 
number of {’s into each set. We then have: 

Theorem Ilia. Let \i he the likelihood ratio appropriate to testing the hypothesis 
Hi that the normaUy distributed variates • • • , x*” fall into the rmUuaUy inde- 
pendent sets X*, • • • , x"‘; • • • ; x"'"‘‘''\ • • • , x". Then the expected value of 
X = J, 1, li, • • • , is greater under the null hypothesis H t than under any 
altemative hypothesis in il. 


7. The general regression probtefii. Let the variates x‘, 
tributed according to the law 

1 Hif 1^ ei»»— c(»r) 


, x‘ be dis- 


(7.1) 


Throughout this section, let the ranges of the indices be 

i,i = 1, ••• ,< p, g = i + 1, 

r, « = 1, • . • , TO r', «' = 1,' • • . , f + g 

p, >' = < + 1, •••,< + g <r, t=*< + 3 + 1>'-' 


,TO. 


In (7.1) we therefore have t random variates, and to — t fixed variates. Con- 
sider the hypothesis H that the x' are independent of the last set of x’s, namely 
x'. We have 

0: II II positive definite, — « < C*p < «, 
while for <a we impose the additional requirement 

Ct - 0. 

Thus in general we have for the distribution of random samples On , N > m. 
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while when H is true, we have 


(7.3) 


P = 


_ I Bii I*'' “ i>m4-ci»iH4-cisi) 




Differentiating (7.2) with respect to the B’b and C"s and setting the derivatives 
equal to zero gives us the conditions 

(7.4) 'Z,&,xlxi= Z *1 , 


a-1 


(7.5) 


B'^ = lt (4 - CUS)(®1 - Cixl). 

Jy a-l 


As in section 2, we put 



XaXa 


and assume that the fixed values xl have been so chosen that |1 H is positive 
definite. Then (7.4) and (7.5) can be combined to give 


where I1 Op, II * 


Similarly, 


where 


5‘'^ = |(a--o‘'’o;,a«) 

= 11 o” 11. It then follows that 

p. - |a‘'r‘«(0"'«-‘'. 

II 0^. ir‘ = II o'" ll- 


The matrix H o” H will be positive definite except for a set of probability zero, 
so that we can consider 1 1 o*' 1 1 as the inverse of the matrix obtained by removing 
the last m — t rows and columns of the inverse of || o” ||, and || 11 as the 

inverse of the matrix obtained by removing the last q rows and columns of 
II o’"'*' 1|~\ Then by the Jacobi theorem 


|a"r‘ = 



lai'l- 



SO that the appropriate likelihood ratio is given by 
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It will be advantageous to complete the matrix || Bn || in (7.1) by defining 

Bpq = CpBi^Cq. 

(Evidently B<p = 0 for t = !,•••,< and fixed p, if and only if Cj, = 0, 
j = 1, • • • , <). We can now write (7.2) as . 

(7.7) P(,, B) - 1 8|j;: 

IT*''* 

We next notice that X is invariant under the transformations 


so that if we put 


a;* — > — * §^ 1 ^} 


I{B, Xj = f P(x, B)dxl • . • dxji, , 


where the integral is extended over the region 


it turns out that 


S: 0 < X < X. , 


I(B, X.) * I(B*, X.), 


provided 


B*,- = alBkiot], Bif, = a^BkB, Bj, = alBkrPl- 


To prove the locally unbiased character of the test, we may therefore consider 
the derivatives 






and assume that H B*, H and |1 o'" 1| arc in diagonal form. We also observe 
that X is unaltered by the transformation 


-> + B^Btp*". 


We therefore have 


/(B* X.) 


|BJ‘,1*'^ f ".I, 


Thus, 
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which is easily seen to be sero. Again, consider a non-repeated second partial 
derivative, say 


dBldBt 


= -2 


\BL 


* 


rV^' 




B*^'ar 


- 2 2 


Pa- 2 .tJ-tA 

/S—1 / 




This plainly vanishes if ^ 1] but it is by no means easy to see what happens 
when k = I, even when a ^ t. Let us therefore study the distribution law of 
X*'" for the case, 

Bi, = 0, i ^ 1. 


(We shall not, however, assume that the transformation B -* B* has been 
made on the B’s.) 

Define 

Bpt ~ BpQ — BpiB'^B jq , 


Cl Clf^yCl f 


where ||a#n.|| now .stands for the inverse of lin^‘^11. These expressions will 
arise when we adapt Wilks^ method of moment generating operators [13], based 
on the identity 


(7.8) f exp {-Bpqa^ 


to the problem. We shall understand from now on that B = | Btj | and 
= II 5,, IP*. Let us rearrange the form in the exponential on the 
right, thus: 

+ B„a" - 2B„.B’'B„a'" 

- - BqiB'^Bira" 

= Q - B,iB'’BirS^’ 

^Q-B%i. 

A subscript j9 will denote the result of replacing Br;> by B,',> + , and a 

prime will indicate that each /Sr'.- has been replaced by dr-.' + Consider 

now the result of integrating the right hand side of (7.8) after these replace- 
ments have been made: 

J gjjp . . . dftdfu-i • • • c^t+t 


(7.9) 



IJKBLmOOD-BATIO TOSin TOR lNI>aiFBin>m01l 2S 

Let us integrate first with respect to the . Wilks has shown how to wii^ 
Qf in the form 

Qi = -Qif + , 

where 

Qw = + 2B^«b;«B/,o'' + B^Bi’^B^^o'^o^o". 

This latter expression is thus free of the . Consequently, 

where 


which can be written 

{B,.v,B;’'B,*^B;*'{y{,a'" + 2B,i,B;‘'B.*B;*'{,{,a'' 

Jifi 

+ B,<Bi‘'BrtBi*‘{,f,o"‘o,,o"). 

The method of reduction used by Wilks can now be applied to Qif and Qtf , 
and gives 

Qw + Q'tf, = + 2B,ifB^>Bi,ar + B^B^^By^o'^o.^o", 

an expression which does not involve the {'s. Thus 
(7.10) f = ir‘* I o'" r‘Bj‘»e~*'.B;*«. 

Now the quantity 

B'r, 

r-0 VI 

where B*' stands for the cofactor of B,-,- in H B,y {|, can be expressed in terms 
of Bff , provided we use our assumption that By, = 0,i ^ 1, whereupon yy#BjJ'^ 
reduces to the single term yB^”. In fact, we have 

Big, I o" i*] - n - m + < + 1 - i, 2h) | o'* rir*'*‘B?‘*''-^> 

X - r.ffV 

X OP = «n y V*' I o" Z 
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where, following the notation used by Wilks [13], 

g, « c"'”*'*, K = exp (-5,,o'*), 


r[i(o + 6)] 




And (7.11) can be written as 


(7.12) 


E\gf I o” 1*] = n ^ • I 

i -1 


'^y' r[i(-^ — ?) + A] d’ ('n-tJ(Ar-*)+*i'j 

^hv\ rii(JV - q) '+ h + v] du' ^ ' 


where Bu stands for the result of replacing Bn by — u. Changing jS,',* into 
/3r'*' + {r'J*' and integrating, we then find that by virtue of (7.10) 


(7.13) 


Now 


E[gi> I a" r I a''*' T*] = I1 1 a'^ 1* 1 a"' 


^-ufry’ mN-q)+h] a' 
^ r ! r[i(N - q) -h h +7] du‘ 




f IliiN - q + 2h + 1 - i, -1), 

J iwmi 

SO that (7.13) becomes 

E\g0 1 a” 1* lo^'*' r*] = Ur^N - m + t + 1 - i, 2h) 

(7.14) xILHN - q + 2h+l - i, -1)1 0*^1* I o'"!-* 

V R-Io.-O-. v y' IW - 9) + M a' 

® h Vl iTf(r=' g)“+T + r] 

Comparing (7.14) with (7.12), and making use of the fact that 

^(a, - 1)^(1 - 1 , - 1 ) ... ^(a - 2/1 + 1 , - 1 ) = ^(a, - 2 /i), 

we thus have 

E\g0 1 o” 1* 1 o''*' r*] » Urf>(N -m + t+l-i, 2h) 


xjliiN - q + 2h + l - i, -2/i)l a”* 1* 1 o'*’ 
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Setting the /S’s equal to zero, performing the differentiation, and recalling the 
definitions of R and Qp , we then find 

= ilHN -m + t + l-i,2h)tltl;{N -q + 2h + l-i, -2h) 


(7.15) 


v(yB“)' rIKAr-g) + x] r[}(i\r - g) + v] 


« yl mN-q) + h+p] rlKAT-g)] ’ 
Taking the first factor from each product, we can convert (7.15) inti| 

n ^(l\r - TO + < + 1 - t, 2A) n ^(iV - g + 2A + 1 ~ i, -2h) 

V /.“»*" V (y^”)’ rti(^ — TO + f) + 5] r[i(7V^ — g) + W 

S rl r[KJV- TO + Or'fft(lV-g) + A + r]' 

This last product of ratios of F’s is equivalent to 

Tim - g) + r] r[i(TO -i-q)+ mm -m + t) + h] 

nm- m + o]r[K»» - g) + »']■ nm' - giT a + r] 

Thus the moments of are connected with an integral equation of type B 
[12] and is distributed like the product 

z-dt ••• 6t 0 < z < 1, 0 < < 1, 

where the joint distribution of the 0’b is 

m 

_ TT r[i(-^ — g + 1 — *)] 

r[i(Ar - TO + < + 1 - t)]r[K»i - r- g)T ‘ ^ 

and z is distributed independently of the 6’b with the distribution 

(7.16) F{z) -e ^ ^ j ^ Z7 “ g) + yj • 

The probability that 0 < X < X, is therefore 


Hy, X,) = jj(6)F(z) dzddt ••• dBt, 


where S is the region 0 < 8% ••• 0i, z < X^". Putting ip{8) for the upper limit 
of z in for fixed 6, and S$ for the projection of S into the 6 space, we then have 

M I 00 A.Dllxr > 




y\ Jo B[KiV-TO + 0,i(»»-<-g) + »'] 
If we replace z by (1 — z) we then find 


(7.17) 


I(y, Xo) * / m 

Ja$ 

w f' (yB^Bim -t-q) + P,i{N-tn + ty,l^q>]\^, 
i3"7r Bl§(TO~f-g) + r,i(lV-TO + 0] / 
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As far as y is concerned, (7.17) is essentially the same as (2.8). The computa- 
tion which was made there, together with the type of reasoning employed in 
the latter part of section 5 in connection with the independence test for several 
blocks, then shows that 

~I(y,\,)>0 (0<e<l). 

oy 

Remembering that 

y = ^BnBri , 

we see that 

=0 =2o" 

and we remark that the assumed positive definiteness of 1 1 { | implies that of 

11 o" II. Hence the relation 

together with the fact that we could have obtained the analogue of (7.17) 
under the assumption 

Bi, = 0 i 7^ io , 

where «o is any fixed number in the set !,•••,<, shows that the matrix of 
second partial derivatives is positive definite when H is true. 

Thus we have 

Theorem IV. Let x^, • • • ,x* be normally distributed about means which are 
linear functions of certain fixed variates ■ , a;”*. Then the likelihood ratio 

test for the hypothesis that the distribution of x^, , x‘ depends only on a selected 

subset a;*'*’*, • • • , of the fixed variates is locally unbiased. 

The result of this section has its most immediate application to those problems 
in the analysis of variance which require simultaneous consideration of several 
interrelated dependent variables • • • , x* in conjunction with a given set of 
independent variables x*'*'*, • • • , x” [15]. For the usual h 3 q)othe 8 is to be tested 
in this case is that x\ • • • , x' are jointly independent of, say, x*'*'*'*'*, • • • , x". 

To return to the general case of (7.1), the method of this section can also be 
used to test the hypothesis that the regression coefficients referring to the x' 
have particular values, say 

Ct = CU t = 1, • • • , t; a = < -t- g -f 1, • • • , m, 

the remaining C’s and the B’s being left unspecified. Since we have 
X* - Cjx" - Cjx' - x‘ - Cy - (Cj - Cio)®' - Cjo®', 
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)y the device of replacing x\, by *t — CU^a , we can reduce this problem to 
ihat of testing the hypothesis that 

Ci‘ = Cj - Cio * 0. 


Similarly, the problem of testing whether the linear functions = oJCj have 
ipecified values comes under the same heading [7]. 

A particularly interesting case of the general regression problem is that in 
vhich m = t + 3 + 1, so that the null hypothesis H states that the chance 
variables *’ are independent of the fixed variate though they may depend 
ipon • • • , In this case we are able to find the exact distribution 

aw of without assuming that any of the regression coefficients C* are aero. 
For the quantity 


:7.18) 


V g-mn-Q>+h+yi 

^ vl ^ ’ 


which would have occurred in (7.11) had it not been for the restriction Bi, = 0 
[i 9 ^ 1), can now be expressed in terms of even without this restriction. 
By definition 

ind the vanishing of the Bmi is equivalent to the vanishing of the regression 
joefficients Ci, associated with x”. And since 

1 Bii - ua”'”B„iB^ \= B - ua”"^B*^B„iB„i , 


iwe can write (7.18) in the form 


v 1 nm -q) + h] 

^op\T[i(N-q) + h-^9]'du’’ 




where 


B»^\\ = \\Biif- ua””'B„iB„ 


is positive definite provided u is sufficiently small. Thus the moments of X*^* 
3an be found from (7.16) if we put a”'’"B'’BmiBmi = VaB'^ in place of yB^^. 
Moreover, it can be seen that when the value m = < + 3 + lie substituted 
into (7.16), that expression reduces to 


B[(x*"")"] - E 


{yuB'y BliiN -m + l)+h,Um-q-i) + v] 


vl B[i(A^-m + l),i(»»-?- 1) + V] 
30 that is distributed like w, where 

(7.19) /(w) » ■ X 




j^l B [|(JV — «i + 1), J(m — ? — 1) + p] 
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The distribution law of for this case is thus closely related to that obtained 
in the treatment of the regression problem with one dependent variate in 
section 2. Applying the argument used there, we can obtain: 

Theorem IVa. The likelihood ratio test for the hypothesis that in a population 
of the type (7.1) the variates x' are independent of x”' — the case m = t + ? + 1 
of Theorem IV — is completely unbiased. 

If we specialize the problem somewhat further, considering the case g = 0, 
xj = 1 (so that m = < + 1), we find that the likelihood ratio takes the form 

yUN _ I _ 1 

'i + Nv]i£*0 i + T’ 

s 

where v'^ = £ (®a ~ — x^, and T is Hotelling’s generalization [5] 

a-l 

of Student’s ratio. In this case we are testing the hypothesis that the x’ are 
distributed with zero means. The exact distribution law of 

was recently published by P. L. Hsu [6], who obteined it in a very elegant 
fashion by means of the Laplace transform. He has also shown that the re- 
sulting test is most powerful in the sense that, of all critical regions S for which 

P{x C -S} = e + + Rib) 

(where * and a are independent of the B'’, and of the means h,- , and R is an 
infinitesimal of at least the third order as all h, tend to zero), the critical region 
defined by 

5: T>T, 

has the largest possible value of a. Tang’s tables [11] make it evident that 
this largest possible value of a is actually positive and that the test is in fact 
unbiased for all values of the b’a when « = .05 or « = .01. The results of this 
section may be used to show that this property extends to all probability levels 
other than e = 0 and e = 1. 

The application of Hotelling’s T is by no means confined to the above case. 
Other h 3 rpotheses which can be tested by means of this statistic are discussed 
by Hsu [6]. In addition it is now known that the Studentized />*, devised 
by Mahalanobis for measuring the “distance” between two normal multi- 
variate populations, is proportional to Hotelling’s T. This fact is pointed out 
by R. C. ^se and N. Roy [1], who have obtained the exact distribution of D* 
for the case in which the two populations from which the samples are drawn 
are assumed to have the same matrix of variances and covariances, but are 
allowed to have different sets of means; their work, however, is quite independent 
of Hsu’s. They also note that D* is proportional to the ratio which arises in 
Fisher’s method of multiple measurements [4], 
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8. Sommaxy. The method of likelihood-ratios is of practical as well as theo- 
retical importance, because it provides a unified approach to the problem of 
testing statistical hj^theses. In this paper we have investigated many of the 
tests which this method yields when applied to hypotheses about sets of re- 
gression coefficients and covariances in normal populations. By studying the 
probability functions of the corresponding X-criteria we are able to show that 
these tests are “good,” in the sense that they are unbiased even for small samples. 

Among the completely unbiased tests which can be based on the likelihood- 
ratio method, our discussion includes: the multiple correlation coefficient, with 
or without fixed variates [13]; Hotelling’s generalized T test [6] and the sta- 
tistically equivalent “Studentized Z)*” [Ij; the ordinary analysis of variance 
and covariance for orthogonal or non-orthogonal data [11, 16], as well as related 
tests of linear h 3 rpothcses in the case of one chance variable. 

With respect to the analysis of variance for two or more variables [15] and 
certain other hypotheses regarding regression coefficients in multivariate popu- 
lations, though there arc indications that the tests are completely unbiased, we 
have succeeded in demonstrating this property only in the local sense. 

Finally, the likelihood-ratio test for the hjrpothesis that the variates fall into 
certain specified mutually independent sets [14] is shown to be unbiased, at 
least locally, and has the additional property described in Theorem Ilia. 

In conclusion, much more than a word of acknowledgment is due to Professor 
S. S. Wilks of Princeton University, to whom the writer is greatly indebted for 
advice and encouragement. 
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INTRODUCTION 

An important portion of algebraic invariant theory has been that devoted to a 
certain class of invariants called seminvariants, semi-invariants, or more rarely, 
half-invariants. Of these terms, ^^seminvariant’’ seems to be the one now 
commonly accepted. The same three terms have been applied at various times 
and by various writers to a system of moment functions of importance in sta- 
tistical theory. The statistician using these terms has frequently done so with 
an apology for appropriating a term of the algebraist. As a portion of this 
paper we shall show that the moment functions of this system are actually 
algebraic seminvariants, and that there are other systems of moment functions 
which are equally entitled to the name seminvariant. 
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The study of the statistical seminvariants of a population leads naturally to 
consideration of the problem of obtaining from a sample tmbiased estimates of 
the value of these seminvariants. Estimates of this kind have been defined 
and computed by previous authors, but no simple method of obtaining the 
estimates has been given. In this paper a simple procedure for calculation is 
given and it is furthermore demonstrated that these estimates form an important 
phase of statistical seminvariant theory. 

The system of notation used for moment functions is that of R. A. Fisher, 
although the actual letters used in representing particular moment functions are 
not altogether the same as those used by Fisher. In general, a moment function 
of the population has been indicated by a Greek letter, the corresponding sample 
moment function by the corresponding English letter and the estimate by the 
corresponding capital English letter. 

A list of references appears at the end of the paper. Each reference has been 
assigned a number and this number placed in square brackets is used in the body 
of the paper to indicate the reference. Pages of the reference are indicated by 
additional numbers inserted in the parentheses and separated from the reference 
number by a semicolon. 


I. THE RELATION OF THE ALGEBRAIC SEMINVARIANT THEORY TO THE MOMENT 

FUNCTIONS OP STATISTICS 


The purposes of this chapter are: (1) to review briefly and give adequate 
references to certain important phases of algebraic seminvariant theory, (2) to 
apply this material to the moment functions of statistics. 

1. Definitions. Any function of the coefficients of the binary form 

(1) 

which is invariant under the transformation 


( 2 ) 


X = + ytri, y = + 821?, 


A = 


yi 

Si 


72 

St 


9^ 0 , 


is called an invariant of the form /. See Dickson [1 ; 31-36]. 

Any function of the coefficients of / which is invariant under the trans- 
formation 


( 3 ) X = ^ + yv, Y == v, 

is called a seminvariant of /. 

The two operators 

( 4 ) 0=2 O = 2 (» - »■ + l)o< , 

are of fundamental importance in the theory of algebraic invariants and semin- 
variants and, indeed, invariants and seminvariants may be defined by means 
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of these operators. A necessary and sufficient condition that m homogeneous 
isobaric function of the coefficients of / be an invariant is that it be annihilated 
by both (2 and O. See Elliott [2; 113, 124]. The necessary and sufficient 
condition that an homogeneous isobaric function of the coefficients of / be a 
seminvariant is that it be aimihilated by Q. See Elliott [2; 127]. 

It should be noted that there is nothing in the definitions above which requires 
that invariants or seminvariants be integral, although usually only this type is 
discussed. In what follows we shall find it more profitable to discuss homoge- 
neous isobaric fractional seminvariants, the fractional quality resulting from 
the appearance of Oo in the denominator. 


2. Complete Systems of Seminvariants. By direct application of the trans- 
formation (3) to/ the system of seminvariants [1; 47] 


( 6 ) 




Oo ' 


r < n, 


is obtained. This system is a complete system, [2; 44, 205, 206], in the sense 
that all other seminvariants fractional in oo and of degree 0 are expressible 
rationally and integrally in terms of this system. 

Other such systems can be defined. The system of minimum degree semin- 
variants, the seminvariants of even weight being of degree 2 and those of odd 
weight being of degree 3, has played an important role in the algebraic seminvari- 
ant theory. Elliott [2; 207-209] discusses this S 3 r 8 tem and gives the general 
formula for the even weight seminvariants of the system. So far as the present 
writer has been able to discover the general formula for the odd weight semin- 
variants has never been published, although Hammond [3] may have obtained it. 
After some lengthy but not difiicult computation the result has been obtained, 
so that the last mentioned system of seminvariants is completely defined by 


2 i-0 \t / a? 

(6) ~ i (-!)<*' 

+ r/ t -f r + 1 


Orr-4 * 4-1 

al 


+ 1 : 

<-0 


( 2r\ 

*/ oj 


It is easily demonstrated that for each of the above seminvariants, and in 
fact for any seminvariant, the sum of the numerical coefficients is zero. Dickson 
[1 ; 55] gives a suggestion leading to a very simple proof. 


3. The MacMahon Non-Uhitaiy Symmetric Function Princ^le. Denoting 
the roots of £ ( ” ) = 0 by ai , at , • • • «. , the r-th power sum of 

i-O \t/ 

these roots is defined by 
(7) 
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The form / may be written IKX - aiY). 

<-l 

By a result due to MacMahon [4; 131] the seminvariants of the form/ are 
identical, except for numerical factors, with those symmetric functions of the 
roots of 

(8) 

t-»0 tl 

which when expressed in terms of sums of powers of these roots do not con- 
tain 8i . MacMahon called such symmetric functions ^^non-unitary/^ 

As a result of this theorem, MacMahon was able to discuss the seminvariants 
of a binary form of infinite order by discussing the non-unitary symmetric 

functions of the roots of 2 7* = 0. 

t-O tl 


4. A Third Complete System of Seminvariants. By application of the result 
stated in the previous section, a third complete system of simin variants can be 
immediately obtained. Obviously th(» power sums , r > 1, are independent 
of 8i . By the Waring formula, Burnside and Pan ton [5; 91-92], if 


then 

(9) 

wherein 


Then for 


i:c,r‘ = coIT(i -a. r) 


-_w(sy (if 

TillTl! • • • Tn! \Co/ Vo/ 



9 



r 



»-"i 


K“Z;Sx--, 

t-0 ll 


(_i)^-vi(p _ 

(10) ■ (r - 1) Isr = S - W 


iTilirg! • • • TTn! (2!)'* • • • (n!)'* 


Placing Br = — (r — l)!sr the B's form a complete system of seminvariants. 
This result has some interesting statistical connections which will be men- 
tioned later. 


5. linearly Independent Seminvariants. It follows from the MacMahon non- 
unitary symmetric function principle, or it can be proved easily in other ways, 
that the number of linearly independent seminvariants of a given weight r is 
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equal to the number of partitions of r which contain no unit part. Furthermore 
we have at our disposal a simple method for obtaining a set of linearly inde- 
pendent seminvariants of any given weight. 

For many purposes the power product defined by Dwyer [6; 13] is more 
useful than the customary monomial symmetric function. The power product 
is defined by the right hand member and indicated by the left hand member of 

(11) (?i • • • Qr) — 2 ••• afj, 

where, for convenience, Qi > Qt > • ■ • > Qr ■ The monomial symmetric func- 
tion which will be denoted by Af (^i • • • q,) is related to the power product by 
the identity 

(12) n! • • • ir,mqVqP • • • qV) = (q^qP ■ ■ ■ qP), 

so that a distinction occurs only when there are repeated exponents in the 
summation of (11). 

If we dcj^ire a system of linearly independent seminvariants of weight 6, by 
the MacMahon principle we need only to compute the values of the power 
products (6), (42), (33), (222) in terms of the a’s. In a somewhat different 
form these will be presented later. 

6. The Roberts Theorem. Roberts, see [2; 231] and [5; 108], demonstrated 
the existence of a duality relationship between power sums, s’s, and coefficients, 
o’s such that corresponding to any seminvariant in terms of o’s there exists 
a seminvariant in terms of s’s obtained by replacing o,- by «,• . The proof con- 
sists of showing that the annihilator for seminvariants in terms of power sums 
is identical in form with 12, a< being replaced by s,- . 

As a result of this duality, each of the systems of seminvariants which have 
been obtained 3 nelds, upon replacement of o,- by s< , another .system of semin- 
variants. In particular cases it may happen that the systems are identical 
when the identities connecting the o< and »< are taken into consideration. 

We next wish to show that the systems of power sum seminvariants thus 
obtained either are identical with certain well known statistical moment func- 
tions or lead to new ones. 

7. Statistical Distributions Represented by Binary Forms. The fact that 
statistical distributions may be represented by polynomials has long been 
recognized by statisticians, see Thiele [7 ; 24-26] and Bertilsen [8]. Indeed it 
was this fact which led Thiele to the definition of the seminvariants now called 
by his name. If we have given n observations ai , at , • • • a. , form the poly- 
nomial. 

F - ft (X - «) - z: (”)sz^. 


(13) 
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F is not a binary form, but the seminvariant theory of binary forms is applicable 
since seminvariants are functions of the differences of the roots and are inde- 
pendent of the X and F, which appear merely as convenient symbols to indicate 
the various terms of the algebraic form. 

For distributions containing an infinite number of items the form F is of 
infinite order, but discussion of its seminvariants may be carried on by use of 
the MacMahon principle given in section 3. 


8. Three Systems of Statistical Seminvariants. Before exhibiting some sys- 
tems of statistical seminvariants it may be well to consider the meaning of 
^‘statistical seminvariant, for this phrase has been undefined. In fact the use 
of the phrase is merely a matter of convenience in that it emphasises the fact 
that seminvariant moment functions have not previously been regarded as 
algebraic seminvariants. As used here a statistical seminvariant is an algebraic 
seminvariant which has some application in statistical theory. 

The system of seminvariants (5) yields by application of the Roberts^ Theorem 
the well known system of statistical seminvariants usually called central mo- 

i S S 

ments. If fir ^ — - , the general formula may be written 

71 So 

(14) Atr = £ — Ml)’- 

The system of seminvariants (6) likewise leads to 

( 16 ) - g + 


a system which seems never to have been used by statisticians. 
The 83 rstem (10) leads to the well known Thiele seminvariants 


(16) 


X, = S (- D^VKp- i)Umi)''(m^)'‘ ••• U r' 


ITi ! TTJ ! 


ir.l(2!r ... (rl)’ 


From sections 3 and 4 it is apparent that the general formula for the Thiele 
seminvariants is a special case of the Waring formula for power sums in terms 
of coefficients. It does not seem that this fact has been previously recognized. 
An equivalent way of stating this idea is to say that the Thiele seminvariant 
Xr is, except for the factor — (r — 1) !, the sum of the r-th powers of the roots of 
the equations obtained by setting the moment generating function, 

M,{Y) » 


equal tp zero. 
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It is of historical interest to note that MacMahon published his non-unitary 
function principle and the resulting set of seminvariants in 1884. Cayley [8] 
published an article in 1885 detding with this same ssrstem. Roberts* Ileorem 
having been known for some time (probably about 20 years), it seems probable 
that MacMahon and Cayley were aware of the Thiele seminvariants four to 
five years before Thiele’s definition [9] by an entirely different method. 

9. Linearly Independent Statiatical Seminvariants. At the end of section 5 
a method was indicated whereby a complete set of linearly independent semin- 
variants of a given weight r could be obtained. It has been noted previously 
that the one part symmetric function s, or (r) leads to the Thiele seminvariant Xr . 
As a further illustration consider the power product (22). From a table of 
S3rmmetric functions we find that 

( 22 ) = ^ -I- 

^ 4!ao 3!aj 2!2!aJ 
_ 2 /tti 4o»oi , SalX 

4!U oj aiJ’ 

and by the Roberts’ Theorem the statistical seminvariant 

^ ( m 4 — inini + Zfti) 

is obtained. In similar fashion a system of linearly independent seminvariants 
of weight ^ 8 have been computed and arc given in Table I. For the sake of 
brevity they are expressed in terms of central moments. Hence the degree, by 
which is meant the maximum degree in the /i"s, is not apparent in the table. 
This definition of degree associates with the statistical seminvariant the degree 
(in the usual sense) of the corresponding homogeneous integral seminvariant. 

10. Statistical Invariants. If the transformation 

(17) a: = { -I- wtfcif, y = mrt 

is applied to the binary form / and, if, in particular 



one system of invariants of / under this transformation is found to be 
(18) Dr = Ar/Ai', r < n, 

where A, is defined in (6). By the Roberts Theorem we obtain the fact that 
the standard moment Hr/nt is an invariant of / under this transformation. 
Thus the standard moments, or standard seminvariants in general, have also 
an algebraic coimection. The effect of the transformation (17) on the roots of / 
is indicated by 

X — a<y = { + miif — maifi — m(a< — A:)if. 



40 


PAUL L. DBSSSBL 


If m and k are defined as above, the result is the equivalent of measuring in 
standard units denoted by 

VM* 

The system (18) is not a system of algebraic invariants, for algebraic invariants 
must be invariant under rotation, translation and change of scale, or stretching. 
The component parts of the above system are invariant only under the last two 


TABLE I 

Linearly Independent Seminvariants of Weight ^ 8 


Weight 

Degree 

Seminvariants 


! 

Weight i 

1 

Degree 

Semin- 

variants 


6 

6 

15/i4Mi ~ 10 m8* H- 30/1*8* 

0 

1 


4 

/At + 5/Ai/i3 — 10/i** ~ 30^8* 

2 

2 

M2 

3 

fit — •+“ 20^18* 4* 30/*8* 

3 

3 


2 

M6 4* 15m 4M8 ~ lO/ia* 

4 

4 

M4 — 3m8* 

7 

7 

fju — 21/*B/Ltj — 35/i4/i8 4“ SOfiaMi* 

2 

M4 4 3m2* 

6 

fn 4- 9n6fji‘2 ~ 35/*4M8 ^ OO^aiMa* 

6 

5 

M5 — IOmiMs 

4 

fij ~ 21miM3 4“ 25pL4/Ai 4“ 30A*aM8* 

3 

Mft 4- 2 miM* 

3 

fA7 4* — SmiA*! 



8 

8 

fjLB ““ 28/iflf*2 — 5%fAtfAt ~ 70/ti4* 4^ 210At4Ma* 4 280/i8V2 “* 105 m** 

6 

MS 4" 14m«M2 5d/itfS9 — 35m4* “* 210m4Ms* 4" HOmsV* 4“ 630m2* 

5 

Ms *“ 28MeM3 4" 49 mimi 35m4* 4" 420m4M** 490m**M2 — 630m8* 

4 

Ms “■ 28MaM2 "" 56 mim8 4" 106m4* 420m4M2* 4“ SOOmsV* 4” 630m2* 

4 

Ms 4" 14 m«M2 “■ 56mims 4" 35m4* 210m4M2* 4" 140mi*M2 

3 

Ms ““ 7 m«M3 4" 49msms 36m4* 4" 106m4M2* 70msV2 

2 

MB 4- 28mims — 56mimi 4- 35m4* 


types of transformation. In statistics translation and change? of scale ordinarily 
constitute the only desired transformations so that the standard seminvariants 

“fr, ; [r, • might well be called statistical invariants. 

Ma M 

11. Seminvariants and Invariants of Samples. Consideration of the defini- 
tion of seminvariants and invariants shows that: 

1. A semin variant is a seminvariant not because it is a function of deviations 
from the mean, but because it is a function of the differences of the observations; 

2. An invariant is an invariant not because it is a seminvariant divided by 
the standard deviation raised to the proper power, but because it is a ratio of 
two seminvariants which are of the same order in powers of the observations. 
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These facts are important from the statistics viewpoint because they show 
that seminvariants and invariants of samples are also seminvariants and invari- 
ants of the population from which the samples are drawn. 


n. BSTMATEB .. 

1. Power Product Seminvariants. The Roberts Theorem set up a duality 
relationship between seminvariants expressed in terms of coefficients and semin- 
variants in terms of power sums. It can be shown that corresponding to each 
pair thus determined there exists a third scminvariant expressed in terms of 
power products. This leads to what may be called a triple system of semin- 
variants, the interrelationships being most apparent when all three seminvariants 
are expressed in terms of the notation defined by (11). The seminvariant 

oj 3<^i 2oi jjj notation 

do Go Gq 

(111) 3(11)(1) 2(1)' 

i(3) n* 

Tlic corresponding power sum seminvariant Is 

(?) ^ ?(?L(1) J- 2(1/ 

n n* ' 

while the jicwer product seminvariant just mentioned is 

(?) ^ ?(?!) . 2(111) 
n 


The value of the power product notation lies in the fact that the numerical 
coefficients of the three seminvariants are then identical, while this is not the 
case when monomial and elementary symmetric functions are used. 

Perhaps a few remarks are in order in regard to the proof of the relationship 
above expressed. The annihilator, corresponding to SI, for seminvariants in 
terms of roots is, see [2; 230“31], 


t-i oai 


It is easy to see that 


• • p%' f Pi Ij • • p» *)| 


and also that, 


(pi'pp ••• pJ-^y o) ^ (n - p + i){pi^ > > > ^ (pi^' » » > p«~rO 


Since 
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and 

(pi)'*(p«)'* (p.-i)'-»(0 ) ^ MpJ'H pi )'*-- - _ (pi)'* ••• (p.-i)'-» 

no ' "" n» " ” ’ ' no-‘ ’ 

it b^mes evident that corresponding to any power sum seminvariant there 
exists a power product seminvariant with the same numerical coefficirats. The 
converse is also true. 


2. Unbiased Estimates of Rational Integral Moment Functions. If r repre- 
sents a population parameter, and if t represents such a function of n observa- 
tions that the expected value of f is equal to t; then t is said to be an unbiased 
estimate of r. See Tschuprow [11; 74-76], Bertilsen [8; 144], and Fisher [12]. 

Let (pipj • • • p.) denote a power product computed from a sample, the sample 
being from an infinite population. Then it is well known that 

B,r(PiP> ••• P*)1 _ . ' ' ' 

n being the number of items in the sample. If ET^ be interpreted as “unbiased 
estimate of,” the above relation may also be written 


(19) 






M,..l = 


(pips • • • p.) 


,,1.1 


and it is seen at once that the power product semin variants defined in section 1, 
if computed from a sample of n observations, are the unbiased estimates of tht; 
corresponding power sum .seminvariants of the infinite population from which 
the sample is drawn. 

This provides an algebraic interpretation as well as a different approach to a 
topic which has already aroused considerable interest among statisticians. In 
1927 Bertilsen [8; 144] gave the estimates of the first four Thiele seminvariants 
of the population in terms of Thiele seminvariants of the sample. In 1929 
R. A. Fisher [12] also obtained these results and gave in addition the estimates 
of the fifth and sixth Thiele seminvariants. His results are in terms of sample 
moments. In 1937, P. S. Dwyer [13; 26] gave the estimates of the first five 
population central moments and indicated also means for obtaining the astimate 
of any rational integral isobaric moment function. 

In the remainder of this chapter 

(1) Dwyer's method will be extended and perhaps somewhat simplified, 

(2) certain properties of this type of estimate will be pointed out, 

(3) estimates of all seminvariants of weight g 8 will bo made available. 


3. Computation of Estimates. From the relationship (19) it is possible to 
write down inunediately in a simple, although not immediately useful, form the 
estimate of any rational integral moment function. Thus the fourth Thiele 
seminvariant is given by 

A4 *■ M4 ~ — 3au* 
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so that the estimate of X4 is 

r « W _ 3^) 4. 12(211) _ 6(1111) 

* n n® 

Since power products are difficult to compute directly, it is necessary to 
express the estimates in terms of power sums. Dwyer [6; 30-33] gave a com- 
plete discussion of the problem of expanding power products in temos of power 
sums and also gave tables of power products in terms of power sums for 
weights g 6. By use of (12) it is also possible to use tables giving monomial 
S3unmetric functions in terms of power sums. One table by J. R. Roe [14; 
plate 18] includes all cases of weight ^ 10. 

By use of such a table we find 

(31) = -(4) -b (3)(1), 

(22) = -(4) -b (2)(2), 

(211) = 2(4) - 2(3)(1) - (2)(2) + (2)(1)*, 

(1111) = -6(4) -b 8(3)(1) -b 3(2)(2) - 6(2)(1)* -b (l)^ 

If these results are substituted in Li above and like terms are collected, it is 
found that 

n^*^Li = n*(n -b 1)(4) - 4n(n -b 1)(3)(1) - 3n(n - 1)(2)’ -b 12n(2)(l)* - 6(l)^ 
a result which agrees with that given by R. A. Fisher [12]. 


4. The Dwyer Double Expansion Theorem. The Dwyer double expansion 
theorem, [6; 34] and [11 ; 37-39], states that if any isobaric sum of power products 
of weight r indicated by 


( 20 ) 


rl 


(qiiy 


(9»!)'' vi! 






qn 


be expanded in terms of power sums in a form indicated by 


( 21 ) 


rl 




then the coefficient a, of the power sum (r) is given by 

(P- l)lrl • 


( 22 ) 


o, = S(-l)'"' 


(pi!)'' ••• (p.!)’V,!...T.r’'‘ 
and that the coefficient ar,-- r» of (ri)(ri) • • • (rm) is 


• • • bp*, , 


( 23 ) Or...... = OrjOr, ... <Jr_. 

The barred product indicates a symbolic multiplication by suffixing of sub- 
scripts which is exemplified by 


OtOt •“ (bt — Sbn + 26 m)( 6 * ~ 6u) = h»s ”* h»u “ Sinn -b Sbim ~ 2buiu * om. 
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The application of this theorem to the present problem eliminates the use of 
tables and permits the independent computation of the coefficient of any particu- 
lar products of power sums in the expansion in terms of power sums of any given 
estimate. The illustration given by Dwyer [13; 39, 40] exemplifies both of 
these points very well. 

5. Estimates of all Seminvariants of Weight ^ 8. If the estimates of any 
complete system of seminvariants and all products of those seminvariants up 
to and including weight r are known, then the estimates of all seminvariants 
of weight ^ r are obtainable as a linear combination of the^sc known estimates. 
For example, suppose that we know the estimates of all Thiele seminvariants 
of weight ^ 5 and wish to find the estimate of mb . Since mb = Xb + IOX3X2 , 

-ET'iMs] = = iT'iXd + lOiT'iXaXs] = L5 + 10L82 . 

In table II are givcm the estimates of all Thiele seminvariants and all products 
of Thiele seminvariants of weight ^ 8. From this table the expressions for Ls 
and Laa arc obtained and, by taking the combination indicated above, it is 
seen that 

= (n' - 5n' + 10n^)(5) - 5(n' - + 10n)(4)(l) 

- 10(n' - n)(3)(2) + 10(n' - in + 8)(3)(1)' 

+ 30(n 2)(2)^(1) - 10n(2)(l)^ + 4(l)^ 

a result which checks with that given by Dwyer [13; 27]. In similar fashion 
the estimate of any other seminvariant of weight ^ 8 can be obtained by use 
of table II. 


6. Computation Checks. There are a number of checks which can be applied 
to the entries in table II. These may be of interest simply as properties of the 
estimates, and they may be of use in correcting errors which may possibly have 
crept into the tables. 

When any power product of more than one part is expanded into power 
sums, the sum of the numerical coefficients of the expansion is zero. To prove 
this we need only to consider a set of observations of which one observation is 
unity and the rest are all zero. Then any power product of two or more parts 
is necessarily zero and all power sums are equal to unity. Hence the initial 
statement of the paragraph follows immediately. 


From this fact it is apparent that the sum of the coefficients of Lr is - , and 

n 

the sum of the coefficients of Lrjr,...r. is zero. Thus for L4 we have 


n* -f- n* ~ 4(n^ + n) — 3(n^ — n) + i2n — 6 _ 1 


n 


( 4 ) 


n 


, and for L22 the sum of the 


coefficients is 


+ n -f 4n — 4 + n* — 3n -b 3 — 2n + 1] = 0. 



TABLE II 

Estimates of All Thiele Semitwariants and Their Products of Weight £ 8 

w = 4 n<«X,4 n<*>LM w - 6 n<‘>I, n<*^La 

(4) n* + n» -{n«-n) (5) n‘ + 5n» -(n» - n«) 
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A condition satisfied by the coefficients of any seminvariant is that their sum 
is equal to zero (See section 2 ). This provides another check on the entries of 
table II, although the He miTi va.ria.nt must be written in homogeneous form 
before the check is applied. Thus we may write 


r 

n 

_ 


(n + 1) S - 4(n + 1) 

n 7\, 


n* n* 



and the sum of coefficients is 


(n + 1) — 4(n + 1) — 3(n — 1) + 12n — 6 n = 0. 

Several checks arise from the fact (sec section 6 ) that every seminvariant 
must be annihilated by the operator 

(24) 

.-1 08 i 

Another check results from the discussion of the next section and is so apparent 
as to need no comment. 

All the checks mentioned in this section are applicable to the estimate of any 
seminvariant. 


7. Estimates as Sums of Simple Seminvariants. A seminvariant such as Lk 
in which the coefficients of the m '’8 are functions of n will be called a composite 
seminvariant, while a seminvariant in which the coefficients of the m'’s are 
purely numerical will be called simple. The fact that is to be established in 
this section is that every composite seminvariant is the sum of simple semin- 
variants. As an illustration consider L 4 . It is apparent that 


L4 = 


I -L ”* t 

„(4) „(4) **> 


where I 4 and are seminvariants of the sample corresponding to X 4 and «« . 
Both I 4 and k^ are simple seminvariants. 

That a composite seminvariant may always be expressed as a sum of simple 
seminvariants can be demonstrated by considering the effect of Q', (24), on a 
composite seminvariant. The coefficients are polynomiab in n and are un- 
affected by the operator. The expression resulting from application of the 
operator can vanish only if the coefficient of rC vanishes for every r. Thus a 
composite seminvariant which has r different powers of n appearing in its coeffi- 
cients is expressible as the sum of r simple seminvariants, which are not neces- 
sarily distinct. Table III exhibits the estimates of Thiele seminvariants of 
weight ^ 6 as sums of simple seminvariants. 

Since the factors, appearing in front of each of the simple seminvariants in 
the expression resulting from breaking down a composite seminvariant, are of 
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successively lower order with respect to n; it is possible to obtain approxima- 
tions of various orders to the value of an estimate by using the appropriate 
portion of the expression given in the table. 

8. The Sstimates of Ihe k’s. The seminvariant k, possesses an interesting 
property which will be called invariance under estimate. By this is meant that 
the estimate of kt is X;r multiplied by a suitable factor. In particular, ki = M end 
xi B Ms and it is well known that 




TOt, = -Ti, 


SO that the k , certainly possesses the property for r = 2 and 3. It can be shown, 
however, that 


Ktr = ~ K* 




From (15) 


so that 


1 ^ /2r\ / t 


By the Binet- Waring identities [15; 6-7] 

(26) (a.6) = (o)(5) - (o + 6) 

and this holds for power products regardless of the values of a and 6. Hence 

n 2 \% / 

(2r) , _ 1 V CO ^ 1 V ('^r\ (f)(2r - i) 

n L n-lj'^2«^ \i ) n® ’ 

Since 

the coefficient of above is - and it follows immediately that 
n n — 1 

BT _ 1 /2r'\ (t)(2r — i) _ »* p- 

2i=S^ \i) n® n®^*" 

This proves the first half of (25) and the second half can be proved in s imilar 
fashion, although with considerably more difficulty. 



54 


PAUL L. DRBS8BL 


9. Other Simple Seminvariants which are Invariant under Eithnate. It has 

been previously remarked (Chapter I, section 2) that the * system of semin- 
variants are the seminvariants of minimum degree, those of even weight being of 
second degree and those of odd weight being of third degree. The xir’s are the 
only seminvariants of degree 2, but for odd weights greater than 7, there exist 
more than one seminvariant of degree 3. It is not difficult to show that these 
additional minimum degree seminvariants are also invariant under estimate. 
The type of proof used could have been applied equally well to obtain the results 
of the preceding section and indicates that the property of invariance under 
estimate which is possessed by the x’s is a direct result of their minimum degree 
property. 

Consider the estimate in power product form of any seminvariant of degree 3 
and odd weight. Power products of 1, 2 and 3 parts will appear. By the Binet- 
Waring identities each three part power product (abc) yields a third degree power 
sum product (a)(b)(c) plus other products of lower degree. Since (a)(b)(c) 
comes only from (o6c) its coefficient must be identical with that of (abc) and will 
therefore be a constant divided by The coefficient of each second degree 
product of power sums will be a sum of terms, the first of which comes from the 
corresponding two part power product with a coefficient identical with that of the 
power product, and the others come from the three part power products. Then 
the coefficient of a second degree product of power sums must be of the form 

Cl , Cj + Cj + • • • + c» _ Cin -1- Cj 

^(s) ■„{») „») • 


Similarly the coefficient of the first degree power sum term will be of the form 

din* + dtn + ds 


Since the estimate of a seminvariant is a seminvariant, it follows that ds ■ 0. 

This is true because the coefficient of ^ must be the coefficient of — 

n* n 

multiplied by — r. Furthermore c* = di = 0 for if the contrary be assumed it is 

immediately possible to break the composite seminvariant into two simple 

seminvariants, the first being of degree 3 (the original seminvariant) and the 

second of degree 2. Since for odd weights no seminvariant of degree 2 exists, 

it follows that any seminvariant of degree 3 and odd weight is invariant under 

estimate. It is also apparent that the factor must appear in the estimate. 


10. Composite Seminvariants which are Invariant under Estimate. For 

each weight r ^ 4 there exists a composite seminvariant which is invariant 
under estimate. For weights 4 and 5 this seminvariant is easily obtained by use 
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of Table III. Thus for weight 4, form the seminvariant X4 + c»Xt • From the 
table we find that 

^‘tX4 + c«Xi] = ^ I4 + ^, *4 + Cb It - cn ^~ - z~2yi ) 

*® "I" ~ n®Ca)A:4. 

If Cm “ the seminvariant is invariant under estimate. This seminvariant 
is 

(27) 1^4 = X4 + 4^^ 

In similar fashion we find for weight 5 

(28) — X$ + X»X|. 

For weights > 5 considerably more difficulty is encoxmtered. For weight 6, 
for example, we consider the seminvariant 

^ 3 

Xs + 042X4X2 -f- C13X2 “i" 0222X2. 

By u.se of table III we obtain 

« 

•F~*[X« + C42X4X2 + CmX* + C2MX2] ®= (U + + Cull + C2b12) + 4*, 


where 4* is a sum of other semiuvariants with coefficients which are functions of 
n and cn , Cu , cm . Now there are only four linearly independent seminvariants 
of weight 6 and it is necessary that one of these involve the term (l)*/n‘. By an 
argument analogous to that of the pnwious section this term cannot appear in 
41 and therefore 4> is expressible in terms of three or fewer seminvariants. Ac- 
tually three are necessary and equating the coefficients of these to zero the values 
of C42 , Cm and Cin are uniquely determined. The result is somewhat lengthy 
and scarcely of sufficient interest to record here. 

The same sort of procedure can be used for determining seminvariants of 
higher order which are invariant under estimate, but the labor of computation 
becomes very great. 

It is possible to obtain moment functions which are invariant under estimate 
by means of a set of equations given by Dwyer [13; 38-39]. These equations 
connect the coefficients of a general isobaric moment function and the coefficients 
of the expected value of that function. In his notation if, for example, 

/4 = 04(4) -I- 4o,i(3)(l) + 3om( 2)’ + 6a,u(2)(l)’ + Ou^(l)^ 

then 

E\f^ 64^144 + 4 i>iin**V»i*i + 36 Mn®Mi* + 662uW**'m»mi* ■!* , 
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wherein : 


(29) 


04 + 4081 + Som 4* 00211 + <3tini = 64 > 

081 + 3o2ii + Oiiii = bai , 

022 + 2 O 211 + flllll = ^22 , 

0211 + <^1111 — ^211 9 

fliiii = bull • 


The problem at hand demands that 




tj I ndi r 4w Oai — r — 

n 


+ on O22 “ I ^11 ""“5 T ^ ^mi 







= X[n04/14 + 471^^^031/13^11 + 37l^*^022M2^ + 071^*^0211 /l2 H“ ^^^^<^1111 


SO that the equations (29) become 

n*aiiu = Xn‘‘’auu , 

n’ojn = Xn***(ajii + flun)> 

n*ajj = Xw**’(asi + 2asn + Cun), 

= Xn'’\o8i + 3aju + ajuj)> 

no4 = Xn(o4 + 4osj + 3o» + 6aju + Ouu), 

and from these equations 04 , an , Oa , am can be found in terms of Omi . Ob- 
viously there is only one solution if none of the o’s are zero. In general, for any 
weight r, a similar system of equations can be found and they determine the 
coefficients of a moment function of weight r which is invariant under estimate. 
It appears that this moment function is always a seminvariant although no 
proof of the fact has been found. The moment functions of weight 4, 5 and 6 
obtained by this method are identical with 1 ^ 4 , and 4't defined above. 


Conclusion. The results of this paper include: 

1 . A demonstration of the fact that the theory of statistical seminvariants is 
identical with the theory of algebraic seminvariants. 

2 . The introduction of new statistical seminvariants. 

3. Simplification of the computation of estimates. 

4. Proof that the estimate of any seminvariant is also a seminvariant. 

5. Proof of the existence of a trio of seminvariants with the same numerical 
coefficients. 

6 . A discussion of semiinvariants which are invariant under estimate. 

Many thanks are due Professor P. S. Dwyer for his able guidance in the 
preparation of this paper and to Professors C. C. Craig and J. A. Nyswander for 
helpful comments. 
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THE ERRORS INVOLVED IN EVALUATING CORRELATION 
DETERMINANTS 

By Paul G. Hoel 

1. Introduction. Many statistical problems require for their solution the 
evaluation of correlation determinants. The method usually employed for such 
evaluation is that of Ohio/ in which the order of the determinant is reduced by 
successive operations with selected pivotal elements. The repeated multiplica- 
tions and subtractions involved in the method necessitate rounding off the 
elements in the successively reduced determinants. The calculated value of the 
original determinant is therefore in error; and so the question naturally arises 
as to the magnitude of this error. 

Previous attempts to answer this question seem to be satisfied with finding 
an upper bound for the magnitude of the difference between the value of the 
original determinant and its value after its elements have been rounded off. 
Moreover, this bound is expressed in terms of the errors in the elements and the 
minors of the original determinant, whose values are assumed to be known 
exactly from calculation. However, several reductions are often needed before 
the value of the determinant can be obtained; and furthermore the minors are 
subject to the same type of errors as the determinant itself. The problem, 
therefore, is to find an upper bound for the magnitude of the difference between 
the final calculated value of the determinant and the determinant itself which 
involves only calculated quantities. 

This paper treats the problem from two different points of view. In the first 
part an upper bound is obtained for the magnitude of the error. In the second 
part the first order error terms are given more detailed consideration, with the 
result that an upper probability bound is obtained for the error. 


2. Absolute Bounds. Consider the correlation determinant A == | |. To 

evaluate A by the method of Ohio, it is convenient to select diagonal elements 
as pivots. It will be assumed without loss of generality that the upper left 
diagonal element is always chosen as the pivotal element in each reduction. 
After each reduction, elements are rounded off to a fixed decimal accuracy. 
Let at, represent the element ijj after the fc-th reduction, ajjy the difference 
between the rounded value of element afy and afy itself. After k reductions, we 
arrive at the determinant 




k I h 
(^nn + Xnn 


^ See for example, Whittaker and Robinson Calculuk of Ob$ervatioiu, p. 71. 
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By treating F* ae a function of the e*, it may be expanded by Taylor’s formula 
as follows: 


(1) F* * A* + XiiA’li + XijXp^Aiipq + 

where A* is the value of F* for all zero, Ai,- is the cofactor of o*, in A*, etc. 

For a determinant of order n, the value of the determinant obtsuned after a 
single reduction is the value of the original determinant multiplied by the 
n — 2 power of the pivotal element used. Applsring this to F*, it follows that 

A" = (afe* + 

Ali 

dk ^ TTn—fc— 8 wA— 1 
i ipq ” A ^ iiPQ f 


etc., where the exponents of Ht are ordinary exponents rather than notation. 
Substituting in (1), 


F* = + m 


n-ifc~2 


+ |i« 


JH-l 


^a^pq 


fJ7p.+ 


In order to express F* in terms of the original determinant, this expansion 
will be condensed by means of the following operational notation. 

(2) F* = (1 + J9 + D* + • • • + D’’"*)/fr*"‘F*"S 


where D' operates on H*“*“‘F*“* by reducing the exponent of by i units, 

by summing from + 1 to n the product of i terms in ** with the corresponding 
cofactors of F*“‘, and dividing the result by factorial i. Using this as a recursion 
formula. 


F* » (1 + D + 


However, 


+ + • • • + • ■ • 

(1 + . . . + D""‘)Hi"-*F“. 


I Gil + *11 


F®= I 


= A, 


Oim + ®nn 


since we assume that a:,-,- = 0 for our ori^nal determinant. Consequently, 
F* » (1 + . . . + (1+ . . . + . . . 

(1 + • • • + 

Since B* operates on F*~* in (2) to extract the proper cofactor of i less rows than 
in F*~‘, which in turn reduces the exponent of all factors H*_i in the expansion 
of F*”* by i units, D' reduces the exponent of all H'b following it in the expansion 
of F* in (3) by i units. 
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Following these rules of operation, and expanding so as to collect terms of the 
same degree in the x’s, we may write 


F* = • • • 

(4) 

~*A -f nr*'* • • • 

H?"* (terms in x,/) •+• 

nn-k-i 

• • • Hi~* (terms in x^Xp,) -f 

Letting H = HiMk-\ • 

• ■ Hi and C = Hr*"* 

• • • Hi~*, we may write 

II 

1 

II 

C (terms in *, 7 ) -|- 

~ (terms in x^Xp,) + • • • j; 

and hence 




1 1 

(5) J — A = ^ (terms in %) + (terms in xaXp,) + • • • . 

Now J is the difference between the calculated value of A, using Ohio’s reduc- 
tion method and rounding off after each reduction, and the true value of A. 
We are interested in finding an upper bound for the magnitude of J. To ac- 
complish this we shall first overestimate the number of terms in the various 
sums of (5), then find an upper bound for the magnitude of the terms in these 
sums, and finally combine the two results. 

In counting terms by means of (3), we may ignore the H's since they merely 
serve as coefficients of the a:’s. Therefore consider the nature of the terms in 

(!-(-•••+ D”'*)(l + . • • + Z)’‘-‘)A. 


Now (1 + • • • + jy)^ contains the .sums ^ a:</A, 7 , — zz 

^ w— #+1 Z I n— *+1 

hence it contains s* terms in a:,-,- , terms in XijXpt , etc. Each of these 

z 


is not greater than s*, ,tCt , etc. ; consequently, the number of terms of each type 
is not greater than the coefficient of the corresponding power of D in the expan- 
sion of (1 + D)'*. Therefore, 


(6) (1 -f- D)'"-«’(l + ... (1+ = (!-)- Z))", 


where m = (n — A)* -f- • . . (n — 1 )*, contains at least as many terms of each 
type as are found in the expansion of F*. This gives us the desired overestimate 
of the number of terms in the various sums of (5). 

In finding upper bounds for the magnitudes of terms, it is to be noted that (4) 
is written with all common factors extracted from each set of terms of the same 
degree in the a:’s. In the parenthesis containing terms consisting of the product 
of r a:’s, the first sum will have unity for its coefficient while the last sum wUl have 
HlHi-i ■■■ HI as coefficient, with all sums between having as coefficients prod- 
ucts of H’a with exponents < r. Hence an upper bound for all coefficients in 
this parenthesis may be written as S', where S is the magnitude of the product 
of those H’b whose magnitude is greater than unity, but unity if none exceeds 
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unity. Now terms in xu are multiplied by An , those in *<,*»« > etc*; 

therefore let S.y , A<yp, , etc., be the absolute values of the largest in magnitude 
of such cofactors. With this notation for upper bounds for magnitudes of 
terms, and (6) giving an upper bound for the number of terms, we may write an 
upper bound for the magnitude of 7 as follows: 

(7) > I ^ (I *) + (I + . . . , 

where e > | a: | is the maximum error of rounding. This result is valid for any 
determinant with real elements. All quantities on the right are available from 
calculations except the A; consequently this upper bound will be useful only if 
satisfactory bounds exist for the minors of the determinant. It can be shown 
that (7) holds for any minor of A, say Au« , if the A have uv added as subscripts; 
and therefore it may be applied to the question of the accuracy of least square 
solutions. 

For the correlation determinant A it can be shown that the magnitude of a 
k is bounded by fc!/2** for k even and for k odd. 

substituting these bounds in (7), 


minor of order n — 

Setting a — yji and 
H 


\J \ < am + a^mCi ^ + a^mCz ^ ^ • • • 


^ .am .am .am . 
<am+ — + — + -^ + 


( 8 ) 


2 2 8 8 
. , a m , am 


for am < 1. Since am is obtainable from the calculations for A, this is the 
desired upper bound for the error in question. 


3. Probability Bounds. In order to find probability bounds for this error, 
it will be necessary to expand the //'s since they involve the variables x. Con- 
sider Hk = atk^ + . Since came from repeated reductions of A, it is 

expressible in terms of the and the minors of A. To obtain this expansion of 
Hk consider 






au + xtk 


Using the same methods as for F*, this may be written as 
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where b‘ is the value of G* for all ** * zero, etc., and where B‘ « 
B\i « etc. Substituting, 


G* = H‘C\G’^^ + Ht.\ E *JrG'5T' + EE + . • 


Using operational notation here also, this may be written as 

G* = (1 + + J5' + . . . + 

where the JE^s operate the same as the D's, except that sums are taken from 
A; — s + 1 to A; rather than from n — s + 1 to n. Treating this as a recursion 
formula, 

F* = G* = (1+ E)HU{X + E + E*-')H^®G^ 

However, 



Uii + Xii 


ail 


• 

• 

dkk + Xkk 


akk 


Consequently, 

(9) = (1 + E)Hl-x(l + E + E^)HU ... (1 + ... + A* . 


Since the E's operate on the following //\s to reduce their exponents, the number 
of terms of various types, that is, of various degrees in the x^s, will not be de- 
creased if the order of H's Is disregarded and their exponents held fixed. There- 
fore consider 

(10) Hi = {I + E)(l + E + E^) (I + ■■■ + E^-^)A,Hl.i . . . hT^ 


as an ordinary recursion formula in the H*s for overestimating the number of 
terms of various types. If (10) is substituted for successive //'s within itself 
in a systematic manner until no H's remain, it will be found that 

fli = (1 + i5) • • • (1 + • • • + ^)Aa: 

[(1 + ^) . . . (1 4- • • • + •••[(!+ ^A,]**'‘[A,]**“‘. 

To merely count terms it is permissible to combine like terms to give 

Hi = (1 + jj)>+*+»’+ - +**-‘(i + E + £*)*+*+*•+•• +**-• ... (1 ^ ... + 

= (1 + Ef'\l + E + E*f~* ...(] + ...+ E'‘-^)K, 

where K is the product of the A's. Since the E’b operate like the D’a, the same 
arguments as those used to arrive at (6) may be used to replace (1 + H • 
+ H*) by (1 + E)‘* for overestimating the number of terms. Hence, the number 
of terms of various t 3 rpe 8 in H* is not greater than those in 

(1 + + H)*‘-**“* . . . (1 + » (1 + 
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where tc* = 2* * + 2*-2*"* + • • • + (A — 2)*-2“ + {k — 1)*. Therefore the 
number of terms of various types in • • • Hi ~* is not greater than in 


^ j^j(n-A!-l)w>*+(n-fc)u>jk-i4-*-+(n-2)wj * ^ £jy 


It is easily shown that t can be condensed into the form 

(13) t « [2*~*(n - fc)-l] + 2*(2*~*(n - Jk)^l] + . . . +(fc - l)*[2®(n - k)-l]. 

From (3) it is evident that the number of terms of various types in F* will not 
be greater than those in the expansion of F* when the exponents of the Ws 
are held fixed. But from (6) we have an upper bound for the number of terms 
arising from the D’s, and from (12) those arising from the ff's; hence the number 
of terms in question will certainly be bounded by those in 


(14) 


(1 + D)"-"‘ = (1 + Dy. 


Now consider the magnitude of terms. The terms arising from the operation 
of D^8 contain minors of A as factors, while those arising from the operation 
of E*8 contain minors of A, , where i ranges from 1 to k. Let Ajy , etc., denote 
an upper bound for the magnitudes of all such minors of the same number of 
subscripts. It is easily shown that A' with 2r subscripts is not less than the 
magnitude of the product of several minors whose subscripts total 2r in number. 
The terms of various types also contain as factors products of the constant 
terms in the The constant term in Hk , which will be denoted by hk , 
can be obtained from (11) by operating with all ones since it will be unaffected 
by disregarding the order of operation. Hence, 

hk = AkAk-%^k-4 • • • aJ a* 


Since the A, are principal minors of a positive definite determinant with no 
element greater than unity, hk has unity for an upper bound. Thus, an upper 
bound for the magnitude of any term in the product of i x*b will be €* times A' 
with 2t subscripts. 

With upper bounds now available for the number of terms and the magni- 
tudes of terms, we are in a position to consider the complete expansion of I in 
which the coefficients of the x’s will be constants rather than Evidently 
the terms in x*, will come from the terms in of (4) with the H^a replaced by 
the constant terms in their expansions. If Z denotes these terms, then 

(16) ^ ‘ ^ 

+ • • • + A* • • • A* 23 J. 


Now consider an upper bound for j / — Z j. Since I — Z involves only terms 
in the product of two or more z% we need consider an upper bound for such 
terms only. From the results of the two preceding paragraphs, we obtain 

I / — Z I < e*,iC»A(,p, + 4- • • • • 
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But from the paragraph containing (8), bounds arc available for the A'; hence 


\I-Z\< e%C, + + ... 


t J SI 

< *_£ + _JJL _ = $ 
- 2 ^2(1-€m) ’ 


for < 1. Since Z is of order t, 4> will ordinarily be small compared with Z; 

therefore consider the nature of the distribution of Z. 

If we write Z = OiXi + • . • + UpXp , then, since the r’s are independently 

distributed with rectangular distributions, it is easily shown that m = 
2 

23 ) «s = 0, 4*4 = 3 — f 23 ®V(23 If ^*"6 approximately equal 

O 

in magnitude, then a 4 is approximately equal to 3 — 1/p. But from (15) 
P ^ §(w — A:)* + • • • + ~ 1)*» which is sufficiently large for determinants 

employing Ohio’s method to justify the assumption that Z is approximately 
normally distributed. Setting L = 



< “ ((n — A;)* + ... + (n — 1)* — i{(n — A:) + ... + (n — 1)*)] 


< ~[(n - A;)* + ••• + (n - D* - j(2n - fc - 1)] = 

Hence, the probability is > .95 that \ Z \ < 24'. Since [ / — Z | < 4>, the 
probability is >.95 that | / | < 24' + 4>; and therefore the probability is >.95 
that 


(16) 


\J\< 


24' + 4> 
C 


This inequality will usually give a smaller bound for 1 J | than (8). How- 
ever, when A is small the H’s may be small, with the result that C will be small 
and (16) may not give a satisfactory bound for \ J \. In such cases the bound 
given by (8) may not prove satisfactory either. 


4. Example. Consider a correlation determinant of order 7 in which the 
elements are accurate to 4 decimal places. If Ohio’s reduction method is 
applied until a 2 rowed determinan t is o btained, then n = 7, A: = 5, € = .00005, 
m = 90, (u = 176, 4' = .00005\/l60/3, and we obtain from (8) that 


\J\< 


(S).oo« + 



.00001 + 


'^Y __ .09000005 

,h) 1 - /omff/H 
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where SjH is obtained from calculations involved in evaluating the deter- 
minant. From (16) we obtain that the probability is >.95 that 


\J\< 


.0008 

c ■ 


Tlie relative advantage of the second inequality over the first depends on the 
size of the pivotal elements, as does the usefulness of either inequality. 

Univbbsitt or Caufobnia at Los ANanLiss 



THE CUMULATIVE NUMBERS AND THEIR POLYNOMIALS 

By P. S. Dwyer 

In a recent paper [1] the author has shown how the moments of a distribution 
can be obtained from the last entries of cumulative columns with the use of 
multiplication by certain numbers. These numbers maybe called ^‘cumulative 
numbers/' It is the aim of this paper to show how these numbers can be 
obtained from the expansion of x* in terms of factorials of the s-th order and to 
demonstrate properties of the polynomials of which these numbers are the co- 
eflScients. 

TABLE 1 


Succesidve Frequency Cumulations 


(1) 

(2) 

(3) 

(4) 

(5) 

(6) 

(7) 

(8) 

X 

X 

/. 

C‘ 

C* 

C* 

C* 

C‘ 

0 + 6 

6 

64 

64 

64 

64 

64 

64 

0 + 5 

5 

192 

256 

320 

384 

448 

512 

0 + 4 

4 

240 

496 

816 

1200 

1648 

2160 

0 + 3 

3 

160 

656 

1472 

2672 

4320 

6480 

0 + 2 

2 

60 

716 

2188 

4860 

9180 

15660 

0 + 1 

1 

12 

728 

2916 

7776 

16956 

32616 

o 

0 

1 

729 

3645 

11421 

28377 

60993 


1. The values Ci(ux). We use the notation Ci{u^) of the previous paper 
[1,289] to express the columnar cumulated entries. The j indicates the order 
of the cumulation while the i indicates the number of the term, counting from 
the bottom of the column. Thus in Table I, which presents the cumulations 
of a frequency distribution used in the previous paper [1,289], C\ = 729; C\ = 
3645; d = 2916; • • • , CS = 6480, etc. Now if fc + 1 values of x are spaced at 
unit distances and if the smallest value of x is 0, it can be shown that 


k k 

C\ = Zm*; C\ = Z(a:+ Dm. 


cj = Z 


^(x + 2)(x + D 



and, in general, j > 0 and j + 1 > i, 


cr t 

Jl 
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Similarly if k values of x are spaced at unit distances and if the smallest value 
of a? is 1, it can be shown that 

* * k 

C} = ]C ; c? * 2 ; 

1 1 


d-Zto-iKi 


2! 


CUE 


^ x(x - 1) 
. - 2! 


j 


^ — 1)(* — 2) 
C.-^ 2i “• 


and, in general, j > 0 and i + 1 > t, 

(2) cr* = E 


* (*+i- i)‘" 


3\ 


u,. 


It is to be noted that the coefficients of «, in (2) could be obtained from the 
coefficients of u, in (1) by the substitution x + 1 = x'. 

2. The powers in terms of factorials of the s-th order. If the s-th powers can 
be expressed in terms of factorials of the s-th order (factorials having s factors) 
then the moments can be expressed in terms of the cumulations. For example 

X = — 80, from (1) 


E = E 


2! 


/. + E^/. = c^ + c^ 


And since 


_ (x + 2)”' + 1)'*' + X 

31 


,(») 


, we have 


E = E 


(x + 2)‘«^^ + 4 £ (* + 


/* + E = cj + 4Cj + Cl. 

0 ol 


0 3! 

In general if 

Aii(x + 8 — 1)^*^ + A«j(x + 8 — 2)**' 

+ • • • + A.,i(x + 8 — j)**^ -j- • • • + A,»x^*^ 


(3) x‘ = 
then 


s\ 


(4) E x’f. = A.iCr* + A.,cr‘ + • • • + A.,c;4:} + • • • + A..c:t} , 

0 

while if the smallest value of a: is 1, we have 

( 5 ) E “ A.,C*/' + A.,cr‘ + • • . + A.yCr* + . . . + A„c:^\ 

1 

These quantities. A,; , in (4) and (6) are simply the coefficients of certain fac- 
torials of the 8-th order in the expansion of x'al. 
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These numbers, for small values of s, are easily obtained. It is possible to 
use the table and a recursion formula of a previous paper [1 ,264-296] for larger 
values of a. It is also possible to obtain these values, without involving cumular 
tive theory, from (3) above. 

While doing this we make a more general approach by expanding (a + x)* 
in terms of these same factorials with the coefficients now functions of o. This 
is possible if we add an additional term, Am{x + to the numerator of the 
right hand side of (3). We have then 

+ Aa{x + s — 1)'*’ 

(6) (a + x)* "* ® ~ '^**®^*^ 

8\ 


The determination of the values A^i (;an be accomplished by purely algebraic 
means by successive substitution of a: = 0, 1, 2, • • • s. In this way we obtain 
« + 1 equations in s + 1 unknowns. For example when 8 = 2 

-^ao(a; + 2)^*^ + A2i{x + 1)^^^ + 

(a + x) = 

HO that when a: = 0, 1, 2, we have 

a* ~ ^4.20 ; (a + 1)^ = 3^420 + -4 21 ; (a + 2)^ = 6/1 20 + 3/1 21 4* Ak . 

The solution is i42o = ; ^21 = 2ali + 1; A 22 — when^ 6 == 1 — a. It 

follows that 

(a + xY = + (2ab + 1) — ^ that 

Zl Zl 

E (a + x)% = o'Cj + (2a6 + 1)C* + b^Cl, 

0 

as indicated in the previous paper [1,293]. 

When 0 = 0, then 6 = 1 and we have 

2x*/* = Cj + C* while when o = 1,6 = 0 and the right 
hand side becomes C* + C* . 

It follows that the general cumulative numbers might also be defined as the 
solutions of the a + 1 equations in the a + 1 unknowns obtained by placing 
X = 0, 1, 2, • • • , a in (6). 


3. Hie evaluation of the cumulative numbers. Formal algebraic methods of 
evaluating equations (6) are somewhat tedious so we use finite difference theory 
to aid in finding the solution. As in the previous paper [1] we use the notation 

_ j fvt when o < x < o + A;\ 

V », = t», - »._i and c, = others j • We then wnte, from (6) 
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«!(o + *)' = A^ix + + Aa{x + 8 - 1)‘‘’ 

+ • • • + + • • • + 4.^. 

We note further that V*^^ {x + r)^*^ = •|q ^ We have then 

(8) W+* (a+jr = A.^ . 

It has been shown in the previous paper [1,292] that 

(9) V*^* (a+ir = i (-!)•(* I (a+j- t)’ 

and it appears that the cumulative numbers could be defined by (9). A useful 
recursion formula has been derived from (9) 

(10) + xY = (a + x)V*(a + xY + l — a — a;)V*(a + x •— 1)* 

4. The cumulative polynomials. We define the cumulative polsmomials to 

be the polynomials obtained by using the cumulative numbers as coefficients. 
Thus when a = 0, 

Pi = y; P 2 = y + y^; P 3 = y + + j/*; -P 4 = y + llj/“ + llj/* + y*; etc. 

It is possible to derive a recursion formula for these polynomials. We use 

(10) with 8 replaced by s + 1 and o = 0 and get 

(11) P .+1 = 2V+V)j;^y* = + S(s + 2 - x)V* \x - D' f, 

which becomes, after some manipulation, 

(12) P.+1 = (1 - j/)SxV+‘j^y + (8 + l)yP. . 

To illustrate wo get Pa from Pj = y + 4y* + y*. Now Sa:V‘(®)*y® = y + 
8 y® + 3y’ and P 4 = (1 - y)(y + 8y* + 3y’) + 4y(y + 4y* + y*) = y + lly* + 
4y' + y*. The recursion formula (12) can be expressed also in the form of a 

differential equation, since P', = —■ (P.) = SxV*‘''’(x)*y*~*, as 

dy 

(13) P .+1 = y[(l - y)P: + (8 + 1)P.]. 

It can be shown more generally that for any a 

P«.o = 1 ; Pa.i = 0 + 6y; P„,j = a* + (2o6 + l)y 4- hV , etc. with 

(14) P«..+i = y(i — y)Ptt,t + [ 0(1 ~ y) + (9 + i)y]Po.» 
as the recursion formula. 

VIA'.),;- 

5. The numerator coefficients in successive dmivatives of the logistic function. 

Lotka has recently exhibited the coefficients of the numerator terms of suo> 
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cessive derivatives of the logistic function [2, 160]. These appear to be, aside 
from sign, the same as the cumulative numbers when o = 0. It is shown in 
this section that these numbers are the cumulative numbers. The scheme is 
generalized to include the numerator coefficients of the derivatives of a more 
general function involving the parameter a. 


Lotka used the function ^ 


r— ; — and obtained = 
1 + e'* 


re 


(1 + e'O*’ 


4>, = 


r*e"(l - e") 

" (1 + c*-')* 


, etc. 


The numerical coefficients are the same if r = 1 so we might 


as well use 4>o 


1 

1 + c*' 


A more general function is the two parameter function 


(16) 


^a.fl “ 


1 + ce*‘ 


Let successive derivatives with respect to x be indicated by f'a.c.i ; 4*a.c.j ; 4>a.c.s 
etc. Then 


^«.c.2 — 


, _ c^Ia + c(l — o)e*] 

9a.c.l - 

'[a* + (-2a“ + 2o + l)ce* + (1 - a)*c*e**] 
(1 + ce*)* 


In general, 


e”Qa.c.. 






SO that 

^ __€**{(!+ ce^)[0'Qa,c,M + Qc,c,«] — (« + l)ce*0fl,c,.} 

” (1 + ce*)*+* 

and 

(16) Qa,e,«H-l = (1 + Ce*)[oQo,e,» + Qo,e.»] — (s + l)ce*Qa,e.* . 

The Q functions can be changed to polynomials with the .substitution e” = y. 
Then derivatives are taken with respect to y and 

(17) Pa.c,.+1 = (1 + cy)(oPa.c.. + yP»,c..] “ (» + l)cyP,.c., . 

When c = — 1, this becomes formula (14) and since P,.o = 1, it follows that 
the numbers of the present section are generalized cumulative numbers. When 
c = 1 and o = 0 we have the numbers found by Lotka. 

It can be shown, further, that the c coefficient of y' is c\ It follows that the 
absolute values of the coefficients, when c = 1 and when c — —1, are the same. 


6. Foimulas for Sr*. A formula for the sums of the s-th powers of the 
integers from 1 to k is obtained by summing (3). We get 
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(18) 


1 I 81 




«! 


+ • • • + Au £ — r 
1 «! 


from which 


2 a:* = A,i 


(k + 8- !)<*■"“ 


(19) 


or 


( 20 ) 


(« + 1 )! 


+ 


+ X.,» + £Ar-‘ + ...+ 


[.(•+» 


(«+ 1 )! 


(«+l)!’ 


y'rr*-V4 (^ + *~ i)' _ 1 V a -1- ^ 

2. * - 2, A., — f,-qrT)i (r+iyi ^ - J) '' W 


7-1 


= Ecr'(i)v*+'(7r. 


y-i 


For example 


^ ji _ (k + 2)'** + (fc + 1)**’ _ k(,k + 1)(21: + 1) 

V " 3l“ 6 ’ 

£ a:» = (fe + 3)<*^ + 4(k + 2)‘*’ + {k + 1)“> _ fc*(jfc + 1)* 


4! 


u+1? 


More generally the values of ^ x* can be evaluated by 

a 

n’hk 1 • 9 

( 21 ) = /r r ^ E (A + s - i)‘'+* V+‘(a + jy = £ CJtKl) V+*(o + jT- 

a VS+l;'7-0 7-0 


7. Summary. It is shown how the cumulative numbers and the cumulative 
polynomials may be obtained in a variety of ways. Of special interest is the 
fact that the cumulative numbers can be obtained by expanding powers in 
terms of factorials and hence they might be called factorial coefficients of a 
kind. It is also possible, though it is not within the scope of this paper, to 
establish interesting relations between the cumulative numbers and tlie multi- 
nomial coefficients, the usual factorial coefficients, the difference of 0, etc. 

REFERENCES 


[1] , P. S. Dwyer. "The Computation of Moments with the use of Cumulative Totals," 

Annals of Math, 8tat, Vol. IX, no. 4, Dec. 1938, pp. 288-304. A more extensive 
bibliography is available here. 

[2] . A. J. Lotka. "An Integral Equation in Population Analysis." Annals of Math, 

Siat, Vol. X, no. 2, June 1939, pp. 144-161. 

University of Michigan 
Ann Arbor, Michigan 



ENUMERATION AND CONSTRUCTION OF BALANCED INCOMPLETE 
BLOCK CONFIGURATIONS' 

By Gertrude M. Cox 

1. Introduction. One of the general problems of experimental design is to 
avoid extraneous effects in making desired comparisons. The method employed 
is to use experimental materials as nearly homogeneous as possible. Such 
materials, however, are seldom available in large quantities. On the contrary, 
field soils vary in fertility from block to block, animals vary with both litter and 
sex, and leaves on one young plant differ from those on another. Differences 
between blocks, between litters and sex, and between plants, being irrelevant 
to the comparisons usually contemplated, must be avoided. 

When the number of treatments to be compared is small, well known methods 
of design,' such as the Latin square or randomized complete block, are available 
and eflicient. As the number of treatments increases, however, these designs 
tend to become less eflficient through failure to eliminate heterogeneity. Fur- 
thermore, they become cumbersome, the Latin square design requiring replicates 
equal in number to the treatments and the completes block design providing that 
each treatment occur in every block. (Blocks are defined as an assemblage of 
experimental units chosen to be as nearly alike as possible.) 

Because of such limitations, several modifications of the complete block design 
have been devised. These new designs all have the common characteristic that 
the experimental material is divided into groups or blocks containing fewer units 
than the number of treatments to be compared. These more homogeneous 
small blocks are referred to as incomplete blocks. 

It is desirable to have all comparisons between pairs of treatments made with 
equal accuracy. This requires of the design that every pair of treatments 
occur in the same block an equal number of times. Such a design is referred to 
as balanced. Balanced incomplete block designs can be arranged (for any given 
number of treatments) only for certain combinations of block size and number of 
replications.^ 

The construction of balanced incomplete block designs is mathematically a 
part of the theory of configurations. A configuration is an assemblage of 
elements into sets, each element occurring in the same number of sets, and each 

^ A revision of an expository paper presented under a different title at a joint meeting 
of the Institute of Mathematical Statistics and Biometric Section of the American Statisti- 
cal Association, December 27, 1939. 

* Numerous additional designs are available in the partially balanced incomplete blocks 
131. 
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set containing the same number of elements. The configurations to be con- 
sidered here are the complete configurations, i,e., those in which each element 
occurs an equal number of times in the same set with every other element. It 
would be useful to know, (a) what configurations (within the useful range) 
exist, (b) how these configurations may be constructed. 

The typical requirement of the experimenter is this: “I wish to test t treat- 
ments and can use blocks of size k(( > k). 1 should like a design which will 
involve as little experimental material as feasible.” The designer must then 
determine what configuration of t elements in sets of k will satisfy the incidence 
relation that each pair of elements occur together in a set an equal number of 
times, and for which the total number of sets is a minimum. There are still 
many configurations which the experimenter needs but which have not as yet 
been constructed. 

In order better to explain the construction of these balanced incomplete block 
designs, it is essential to specify the underlying combinatorial problems. A 
coiifiguration satisfying the condition of balance can be obtained by writing 
down all possibk! combinations, b, of the t elements taken k at a time. 

The simplest example is that in which each set contains only two elements and 
all poasible combinations of the I elements, taken in pairs, appear in the different 
sets. This series of pairs can be written out by the experimenter, and the 
method of analysis is given by Yates [20]. 

Let us take another example; given six elements to be taken three at a time, 


6 = ,C, 


6j_ 

313! 


= 20 . 


The 20 combinations are. 



134 

146 

m 

S46 


m 

166 

U6 

S46 

125 

136 

234 

246 

356 

126 

145 

235 

266 

456. 


Such unreduced designs are not necessarily economical or feasible in experimental 
work. It is often desirable to find some less extensive configuration. In this 
example half of the combinations, either those in italics or the other half, fulfill 
the restriction that every element occur with every other element in the same 
number of sets. Each pair of elements occurs twice in either group of sets. 
Thus, a balanced incomplete block design can be based on either half of the 
20 sets as well as on all 20. 


2. Combinatorial methods. Combinatorial considerations of a simple nature 
enable us to set up necessary conditions which balanced designs must satisfy. 
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We have t elements arranged in h sets of h elements each ; each element occurs in r 
sets, and each pair of elements occurs together in a set exactly X times. Then 
we must have 

tr = hk, r{k — 1) = X(t — 1). 

The first of these equations expresses the fact that the total number of plots 
must be equal both to the product of elements by replications and to the product 
of sets by number of elements per set; the second, that the number of pairs into 
which a given element enters must equal X times the remaining number of 
elements. 

It is convenient to write 

_ X(t - 1) , _ X<« - 1) 

i-l’ ifc(jfc-l)' 

Since the numbers 6, r, fc, X must be integers, it is easy to obtain lower limits 
for any three in terms of the other two. 

To give a general classification, the configurations have beem divided into 
classes according to the value of X. Because of the practical limitations in 
experimentation, table I has been expanded only to include X == 6 and the k 
values from 1~14. It may be well to call attention to the fact that duplications 
occur in the different classes of table I, For instance in the class, X 1, for 
/c s= 6, ^ = 15m + 1, and m — 1, then 6 = 8, and r = 3. In order to construct a 
design, the following condition is necessary; r > k and therefore b > L In this 
example, the condition is mot if 6, r and X arc multiplied by 2, the resulting design 
is f = 16, 6 = 16, r = 6, A; = 6 and X = 2. This configuration is a duplicate 
of the design in the class, X = 2, for fc = 6 and m = 1. In many of the con- 
figurations where X is 3, 4, 5, or 6, a common factor can be cancelled from 6, r and 
X giving a design listed in the classes , X = 1, 2 or 3. 

It should be emphasized that the conditions under which table I was derived 
are necessary, but not sufficient, for the existence of a complete configuration. 
For example, consider the following configurations which satisfy the necessary 
conditions for a design. 


Sub class 
(table I) 

m 

( 

h 

r 

k 

X 

lOw + 5 

1 

15 

21 

7 

5 

2 

21m + 1 

1 

22 

22 

7 

7 

2 

15m + 6 

2 

36 

42 

7 

6 

1 

42m + 1 

1 

43 

43 

7 

7 

1 

45m + 10 

2 

100 

110 

11 

10 

1 

llClm+ 1 

1 

111 

111 

11 

11 

1. 


No configurations of the above specification can actually be constructed. 

A selected group of configurations from table I is given in table II. Only 
those configurations whose kj r and X lie within practical limits, and whose 



TABLE I 
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existence lias not been disproved, have been included. The practical limits of 
kj r and X, of course, are dependent upon the conditions surrounding the experi- 
ment. We have chosen to keep k within the range 3 to 10 except for a few special 
configurations in which t is greater than 100, in which cases k was allowed to 
equal 11-14. Also r has been kept within a similar limited range. (Those 
configurations in table II, with an asterisk preceding <, have not been con- 
structed.) 

The above limitations upon k and r give a small, selected group of configura- 
tions. However, many others either have been constructed or are known to 
exist. For balanced incomplete block designs, Yates [20] gives the lower limits 
of r for t from 4 to 25 and k from 2 to 12 but not greater than J/. Fisher and 
Yates [8] have tabulated the configurations w^hich are known to c'xist having 
ten or less replications including all arithmetically possible configurations the 
existence of which has not been disproved. 

Even if the existence of a configuration lias not been disproved, there still 
remains the difficult problem of writing out the ek^rnents which are to appear in 
each set. Some discussion of the structure of such configurations is presented 
by Fisher and Yates [8] by Yates [20, 21] by Goulden [9, 10] and by Bose [4]. 
Additional descriptions are to follow. 

While a search of the literature revealed a numb(»r of constnicted configura- 
tions, yet the general theory of their formation has received relatively little 
consideration. The question of combinations related to the theory of configura- 
tions which is of interest here was first set forth by Kirkman [11] in 1847. He 
states the problem thus: “If Q* denote the greatest number of triads that can be 
formed with x symbols, so that no duad shall be twice employed, then 

3Q, = z(x - l)/2 - F, 

if for Vx we put 0, when x — 6m + 1 or 6m + 3." This gives the formula for h 
which was given earlier in this article. Put x = t and F* = 0 

^ n _ t{t- 1) 

32' 

Besides the theory connected with these combinatorial problems, considerable 
information related to the construction of the configurations has been found in 
the literature on finite projective geometry, especially the geometry which applies 
to the theory of groups. 

An extensive discussion of the X = 1 class of configurations (as listed in table I) 
can be found in the literature. , The theory of the formation of the configurations 
for the sub-class / = 6m + 3 has been summarized by Ball [1]. This is the 
Kirkman “school-girl problem’^ for which Eckenstein [7] lists 48 papers and 6 
books written during the years 1847-1911 dealing with this subject. The 
problem was first published in the Lady's and Gentleman's Diary for 1850 [12]. 
It is usually stated that “a schoolmistress was in the habit of taking her girls 
for a daily walk. The girls were fifteen in number, and were arranged in five 
rows of three each, so that each girl might have two companions. The problem 
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is to dispose of them so that for seven consecutive days no girl will walk with any 
of her schooWellows in any triplet more than once.” For this particular sub- 
class (< *= 6m -f 3, A: = 3), this type of configuration has been shown to exist 

TABLE II 

Selected Group of Configuration 


(Balanced Incomplete Block Designs) 


i 

b 

r 

k 

X 

t 

b 

r 

k 

X 

7 

7 

3 

3 

1 Y.8.» 

*26 

50 

8 

4 

1 

7 

7 

4 

4 

2 

25 

30 

6 

5 

1 

8 

14 

7 

4 

3 

26 

15 + 15 

3 

5 

1 L.S. 

9 

12 

4 

3 

1 

*25 

25 

9 

9 

3 

0 

6 + 6 

2 

3 

1 L.S.* 

28 

63 

9 

4 

1 

9 

18 

8 

4 

3 

28 

36 

9 

7 

2 

9 

18 

10 

5 

5 

*29 

29 

8 

8 

2 

9 

12 

8 

6 

5 

31 

31 

6 

6 

1 Y.S. 

10 

30 

9 

3 

2 

*31 

31 

10 

10 

3 

10 

16 

6 

4 

2 

•36 

46 

10 

8 

2 

10 

18 

9 

5 

4 

37 

37 

9 

9 

2 

10 

15 

9 

6 

5 

*41 

82 

10 

5 

1 

11 

11 

5 

5 

2 

*46 

69 

9 

6 

1 

11 

11 

6 

6 

3 

*46 

46 

10 

10 

2 

13 

26 

6 

3 

1 

49 

66 

8 

7 

1 

13 

13 

4 

4 

1 Y.S. 

49 

28 + 28 

4 

7 

1 L.S. 

13 

13 

9 

9 

6 

*61 

86 

10 

6 

1 

15 

36 

7 

3 

1 

67 

57 

8 

8 

1 Y.S. 

16 

15 

7 

7 

3 

64 

72 

9 

8 

1 

15 

15 

8 

8 

4 

64 

72 + 72 

9 

8 

2 L.S. 

16 

20 

6 

4 

1 

73 

73 

9 

9 

1 Y.S. 

16 

20 + 20 

5 

4 

2 L.S. 

81 

90 

10 

9 

1 

16 

16 

6 

6 

2 

81 

45 + 46 

5 

9 

1 L.S. 

16 

16 

10 

10 

6 

91 

91 

10 

10 

1 Y.S. 

19 

57 

9 

3 

1 

121 

132 

12 

11 

1 

19 

19 

9 

9 

4 

121 

66 + 66 

6 

11 

1 L.S. 

19 

19 

10 

10 

5 

133 

133 

12 

12 

1 Y.S. 

21 

70 

10 

3 

1 

169 

182 

14 

13 

1 

21 

21 

5 

5 

1 Y.S. 

169 

91+91 

7 

13 

IL.S. 

*21 

28 

8 

6 

2 

183 

183 

14 

14 

1 Y.S. 

*21 

30 

10 

7 

3 







* Have not been constructed. 
> Youden squares. 

* Lattice squares. 


for every possible value of t. Most of the solutions were worked by H. E. 
Dudeney and (). Eckenstein. They are given by Ball [1] for all t’s less than 100, 
that is, for t = 9, 15, 21, 27, 33, 39, 45, 51, 57, 63, 69, 75, 81, 87, 93 and 99. 
Ball describes several methods of constructing such configurations, as cycles, 
combinations of cycles, scalene triangles inscribed in the circle, focal and analyti- 
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cal methods. As an illlustration of the school-girl problem, the construction 
of the configuration for < = 9, 6 = 12, r = 4, fc = 3 and \ = 1 will be shown. 
Scalene triangles are inscribed in a circle with certain specifications (to be 
fulfilled) giving the three sets of triplets for the first day as follows. 

Set Group I 

(1) k 1 5 

(2) 3 4 6 

(3) 7 8 2. 

By rotation or by cyclic substitution tlie other three groups are secured: 


Set 

Group 11 


Group III 


Group IV 

(4) 

k 2 % 

(7) 

00 

(10) 

00 

(5) 

4 5 7 

(8) 

5 6 8 

(11) 

6 7 1 

(6) 

8 1 3, 

(9) 

1 2 4, 

(12) 

2 3 5. 


Then placing fc = 9, wc have the configuration for < = 9, 6 = 12, and r = 4. 
Note that in the school-girl problem the sets arc grouped into complete replica- 
tions of the elements. This problem of 9 girls taken 3 at a time has been sub- 
jected to an exhaustive examination. There are 840 arrangements but only one 
fundamental solution. In the case of 15 girls, the number of fundamental 
solutions according to Mulden [14] and (k)le [6], is seven. Ball mentions the 
Kirkman problem in qviartets which is the sub-class I = 12j7i -[- 4, for A: = 4. 
He states that this has been solved for cases where m does not exceed 49. He 
also states, “I conjecture that similar methods are applicable to corresponding 
problems about quintets, sextets, etc.” 

Before leaving the school-girl problem, an illustration will be given of I = 28, 
6 = 63, r = 9, A: = 4 and X = 1. The following framework was set up by Dr. 
C. P. Winsor using suggestions from Netto [15]. 


dl 

a% 

bz 

bz 

Cl2 

d7 

bi 

bz 

az 

(iz 

Ci 

Cz 

di 

dz 

Cl 

Cz 

bt 

h 

Cz 

Cz 

h 

bi 

Ct 

Cl. 


o, b and c each have every internal difference once and only once; and each pair 
a~b, a-c and b-c must have every external difference once and only once. The 
nine groups are given in table III. The cyclic substitution is within three sets, 
a, b dnd c. That is, 
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in group I, o * 1, oi = 2, Oj = 3, • ■ • , Og « 9; ^ 

in group II, o = 2, oi = 3, o* = 4, • • • , Oi = 1; 

in group III, o = 3, Oi »» 4, oj = 6, • • • , og •= 2; 

etc. 

Netto [15] discusaes t elements in aeta of k, every set of 2 elements to occur 
together in a set exactly X times. He deals with X = 1, and gives a discussion 
of both sub-classes when A: = 3, that is, for < = 6 to -|- 1 and < = 6m -j- 3. Reiss 
[16] and Moore [13] have proved that configurations can be constructed for all 
values of < if A: = 3. This is the tsrpc of information which is valuable in answer- 


TABLE III 

Configuration for I — 28, 6 •» 63, r — 9, A: 4, X ■> 1 

Group I Group II Group III Group IV 


k 

a 

b 

c 

28 

1 

10 

19 

28 

2 

11 

20 

28 

3 

12 

21 

28 

4 

13 

22 

ai 

at 

b. 

b, 

2 

9 

13 

16 

3 

1 

14 

17 

4 

2 

15 

18 

5 

3 

16 

10 

02 

aj 

bx 

b, 

3 

8 

11 

18 

4 

9 

12 

10 

5 

1 

13 

11 

6 

2 

14 

12 

at 

ae 

C4 

Ci 

4 

7 

23 

24 

5 

8 

24 

25 

6 

9 

25 

26 

7 

1 

26 

27 

Oi 

at 

Cl 

Ci 

5 

6 

20 

27 

6 

7 

21 

19 

7 

8 

22 

20 

8 

9 

23 

21 

bt 

bj 

Ct 

Ci 

12 

17 

22 

25 

13 

18 

23 

26 

14 

10 

24 

27 

15 

11 

25 

19 

6« 

bt 

Ci 

Cl 

14 

15 

21 

26 

15 

16 

22 

27 

16 

17 

23 

19 

17 

18 

24 

20 

Group V 


Group VI 


Group VII 

Group VIII 


Group IX 

28 

5 

14 

23 

28 

6 

15 

24 

28 

7 

16 

25 

28 

8 

17 

26 

28 

9 

18 

27 

6 

4 

17 

11 

7 

5 

18 

12 

8 

6 

10 

13 

9 

7 

11 

14 

1 

8 

12 

15 

7 

3 

15 

13 

8 

4 

16 

14 

9 

5 

17 

15 

1 

6 

18 

16 

2 

7 

10 

17 

8 

2 

27 

19 

9 

3 

19 

20 

1 

4 

20 

21 

2 

5 

21 

22 

3 

6 

22 

23 

9 

1 

24 

22 

1 

2 

25 

23 

2 

3 

26 

24 

3 

4 

27 

25 

4 

5 

19 

26 

16 

12 

26 

20 

17 

13 

27 

21 

18 

14 

19 

22 

10 

15 

20 

23 

11 

16 

21 

24 

18 

10 

25 

21 

10 

11 

26 

22 

11 

12 

27 

23 

12 

13 

19 

24 

13 

14 

20 

25 


ing the first question in the introduction of this article; “what configurations 
exist?” Carmichael [5] mentions the quadruple systems 6m + 2 and 6m 4 
and states that the general problem of their existence appears not to have been 
solved. Also for the higher values of k there seems to be very little known of 
any generality, but it is known that for A: > 3 there are certain configurations 
which are not possible. 

3. The method of geometrical configuration. Another aid in the construction 
of balanced incomplete block designs is found in some of the finite projective 
geometries. These are described by Carmichael [5]. A tactical configuration 
of rank two is defined as a combination of I elements into m sets, each set con- 
taining X distinct elements, and each element occurring in li distinct sets. 
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I =(<)== number of points in the geometry, 
m = (b) = number of lines, 

X = (fc) = number of points, 

M = W = number of lines on a point. 

The series of finite projective geometries PG{k, p") for »c > 1 furnishes a 
certain infinite class of these tactical configurations. The following list gives 
those which have been incorporated in the list (table II) of useful balanced 
incomplete block designs. 


Two dimensional space, PG{2, p") 


pn 

m 

m(h) 

X(« 

M(r) 

2 

7 

7 

3 

3 

3 

13 

13 

4 

4 

2® 

21 

21 

5 

5 

5 

31 

31 

6 

6 

7 

57 

57 

8 

8 

2* 

73 

73 

9 

9 

3® 

91 

91 

10 

10 

11 

133 

133 

12 

12 

13 

183 

183 

14 

14. 


Three dimensional space, PG{S, p") 


p» 

1 

m 

X 

M 

2 

15 

35 

7 

3. 

From the Euclidean geometry EG{k, p”) for 

K > 1 other tactical configurations 

can be coiistnicted. 

These are formed from the PG{k, p") by omitting a given 

line from the two dimensional space and a 

plane from the three dimensional 

space configurations. 

Some of the resulting designs are: 



Two dimensional space, EG{2, p") 


p« 

1 

m 

X 

M 

2 

4 

6 

3 

2 

3 

9 

12 

4 

3 

2* 

16 

20 

5 

4 

5 

25 

30 

6 

5 

7 

49 

56 

8 

7 

2® 

64 

72 

9 

8 

3® 

81 

90 

10 

9 

11 

121 

132 

12 

11 

13 

169 

182 

14 

13. 
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Methods are available for constructing the two dimensional space PO(k, p") 
and the corresponding EO{k, p") configurations where p is a prime number. 
This being true, we can also construct the completely orthogonalized squa^ 
from the EG(k, p") geometry. The reverse situation in which these configura- 
tions are constructed by using the completely orthogonalized squares is to be 
illustrated. These squares consist of superimposed Latin squares, fulfilling the 
condition that each number from the second Latin square occurs once and only 
once with each number in the first Latin square. As an example take the two 
Latin squares: 

Latin Square I Latin Square II 

12 3 13 2 

2 3 1 2 13 

3 12, 3 2 1. 

Superimpose square II upon square I to get the completely orthogonalized 
3x3 square, 


11 

23 

32 

22 

31 

13 

33 

12 

21. 


The first number in each cell is a value from square I ; the second number in each 
cell is from square II. Note that the numbers in the second place in each cell 
occur once and only once with each of the first numbers, that is 1-1, 1-3, and 1-2. 
The completely orthogonalized squares have been proven to exist for all prime 
numbers and for powers of prime numbers. The solution of this problem was 
secured independently by Bose [2] and by Stevens [18]. Those of sides 2, 2*, 2', 
2*, 2‘, 2®, 3, 3®, 3’, 3‘, .5, 5^ 5*, 7, 7^ 11 and 13 have been given. 

The completely orthogonalized 3x3 square may be used to construct 


11 


23 

4 

32 

7 

22 


31 

5 

13 

8 

33 

S 

12 

6 

21 

9 


a balanced incomplete block design. The italic numbers, which follow the 
cell numbers, designate the 9 elements which are to be arranged in four groups of 
three sets. Group I is formed by placing the elements from each row into sepa- 
rate sets, in group II the elements from the three columns are placed in three 
sets; in group III the first set (7) consists of the elements which follow 1 in the 
first place in the cells, set (8) consists of the elements which follow 2 in the first 
place in the cells; and group IV is assembled in the same way as group III except 
the numbers in the second place in the cells are used to select the elements for 
each set. Thus we have the configuration : 
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Group I 
Set (rows) 

(1) 1 4 7 

(2) 2 5 8 

(3) 3 6 9 


Group II 
(oolumns) 

(4) 1 2 3 

(6) 4 5 6 
(6) 7 8 9 


Group 111 
(first place) 

(7) 1 6 8 

(8) 2 4 9 

(9) 3 5 7 


Grofip IV 
(second place) 

(10) 1 5 9 

(11) 2 6 7 

(12) 3 4 8 


In the 12 sets of 3 elements, each of the 9 elements occurs with every other 
element once and only once in a set. 

This is an illustration of one series of configurations which can be constructed 
with the aid of the completely orthogonalized squares. These are the EG{k, p") 
in two dimensional space when * = 2 and p" = 2, 3, 2*, 5, 7, 2*, 3*, 11, 13, . . . 
The PG{k, p") configurations can be written by adding Qc + 1) elements 
to the previous group of configurations. For example, the elements 10, 11,12 
and 13 may be added to the groups, one to each group. That is, 10 is added to 
each set in group 1, 11 is added to each set in group II, 12 to group III and 13 to 
group IV. An additional set must be added to include these four new elements. 
A configuration for < = 13, 6 = 13, A: = 4, r = 4 and X = 1 results. 


Set 

(1) 

1 4 7 10 

(4) 

1 2 3 11 

(7) 

1 6 8 

12 

(10) 

1 

5 

9 

13 

(2) 

2 5 8 10 

(5) 

4 5 6 11 

(8) 

2 4 9 

12 

(11) 

2 

6 

7 

13 

(3) 

3 6 9 10 

(6) 

7 8 9 11 

(9) 

3 5 7 

12 

(12) 

3 

4 

8 

13 








(13) 

10 

11 

12 

13. 


The 13 sets are made up of 4 elements each. The.se designs are .symmetrical 
for sets and elements, that is, every pair of elements occurs togetlier in the same 
number of sets, also, every pair of sets has the same number of elements in 
common. Discussion of the construction of these designs with illustrations are 
given in references [20, 8, 9] and [19]. 

In the PG(k, p") series of designs, as constructed by means of completely 
orthogonalized squares, the sets cannot be arranged in replication groups. How- 
ever, these configurations can be arranged in Youden squares [22] in which all 
the sets are placed side by side and all the elements in a single row form a com- 
plete replication. This method of arrangement has been of considerable value 
in experimentation with plants. The Youden squares are the PG{k, p") when 
K = 2. Singer [17] gives a partial list of the (reduced) perfect difference sets 
(table IV), only a single set for each p”. The number of distinct perfect differ- 
ence sets (or the number of distinct perfect partitions) for a given p" is equal to 
<p{q)/Zn. Since each perfect difference set can be paired with its inverse, the 
number is even. 

The construction of one of the Youden squares from its perfect difference set 
will be illustrated. Consider p" = 3 then g = p*" -|- p" -|- 1 = 3* -(- 3 + 1 = 13. 
There are two perfect difference sets with their inverses for q — 13. One perfect 
difference set is 0, 1, 3, 9 which has the perfect partition 1, 2, 6, 4 which will 
add in succession to each number from 1 to and including 13, and also 1, 2, 6, 4 
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add to 13. The elements of the perfect difference set are put in set (1) except 
that 13 replaces 0. Set (2) is secured by a one-step cyclic substitution, 1 for 
13, 2 for 1, 4 for 3 and 10 for 9. This process is continued until there are thirteen 
sets. If the substitution is applied to set (13), the elements in set (1) are secured. 

Set 


Replica- A 
tion B 

C 
D 


(1) (2) (3) (4) (6) (6) (7) (8) (9) (10) (11) (12) (13) 

13 1 2 3 4 5 6 7 8 9 10 11 12 

1 2 3 4 5 6 7 8 9 10 11 12 13 

3 4 5 6 7 8 9 10 11 12 13 1 2 

9 10 11 12 13 1 2 3 4 5 6 7 8. 


This is the Youden square for t — 13, b = 13, r = 4, fc = 4, and X = 1. The 
elements in each row form a complete replication. 


TABLE IV 

Singer’s list of perfect difference sets 

^(g) 


p” q 3" Perfect difference set 


2 

7 

2 

0 

1 

3 












2 « 

21 

2 

0 

1 

4 

14 

16 










2 * 

73 

8 

0 

1 

3 

7 

15 

31 

36 

54 

63 






2 * 

273 

12 

0 

1 

3 

7 

15 

31 

63 

90 

116 

127 

136 

181 

194 

204 

3 

13 

4 

0 

1 

3 

9 











3 * 

91 

12 

0 

1 

3 

9 

27 

49 

56 

61 

77 

81 





5 

31 

10 

0 

1 

3 

8 

12 

18 









7 

57 

12 

0 

1 

3 

13 

32 

36 

43 

52 







11 

133 

36 

0 

1 

3 

12 

20 

34 

38 

81 

88 

94 

104 

109 



13 

183 

40 

0 

1 

3 

16 

23 

28 

42 

76 

82 

86 

119 

137 

154 

175 
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A third series of configurations, called Lattice squares or quasi-Latin squares 
[21] can be constructed by using the completely orthogonalized squares. The 
groups of sets on page 78 are taken in pairs. For each pair a square is constructed 
having its rows formed by the sets of one group and its columns by the sets of 
another group. For example, square I below is made so that the sets of group I 
form the rows and the sets of group II form the columns. Square II is the 
combination of groups III and IV. 


Square I 


1 

4 

7 

2 

5 

GO 

3 

6 

9 


Square II 


1 

6 

00 

9 

2 

4 

5 

7 

3 
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In this lattice square each pair of elements occurs together once only in either a 
row or a column of either one of the squares. Also, every element occurs with 
every other element once in one column and one row from each square. 

A device known as “complements” gives several configurations. From an 
arrangement having k second one can be obtained for the same number 

of elements, in sets ol t — h units. This is done by replacing each set by its 
complement, that is, by a set containing all the elements missing from the 
original set. An illustration follows: 


t 

= 7, 6 = 7 



t = 7 

, 6 = 

7 


r 

= 3, k = 3 



r = 4, 

, fc = 

4 



X = 1 



X 

= 2 



Set 



Set 





(1) 

1 2 

4 

(1) 

3 

5 

6 

7 

(2) 

2 3 

6 

(2) 

1 

4 

6 

7 

(3) 

3 4 

6 

(3) 

1 

2 

5 

7 

(4) 

4 6 

7 

(4) 

1 

2 

3 

6 

(5) 

5 6 

1 

(5) 

2 

3 

4 

7 

(6) 

6 7 

2 

(6) 

1 

3 

4 

5 

(7) 

7 1 

3> 

(7) 

2 

4 

5 

6, 

While the triple systems 

, quadniple systems, 

(“tc., 

which 

have 

bo( 


sidered by some mathematicians, do furnish designs meeting the balance re- 
quirements, they are usually not suitable for experimental purposes. A quad- 
ruple system requires that every possible triple of elements occur once and only 
once together in a block. Since we need only every pair together once (X = 1) 
or more, only the triple systems are generally useful. 

4. Summary. The mathematical theory of configuration has been helpful 
in the construction of the balanced incomplete block designs. It would be use- 
ful to know (a) what configurations (within the useful range) exist, (b) how these 
configurations may be constructed. In table I the configurations have been 
classified according to the value of X, while in table II configurations within a 
useful range have been listed. Of the designs in this table which have not been 
constructed, some are known to exist. Those aids which have been used in the 
construction of the balanced incomplete block designs have been briefly dis- 
cussed. 
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A COMPARISON OF ALTERNATIVE TESTS OP SIGNIFICANCE FOR 
THE PROBLEM OF m RANKINGS' 

By Milton I'riedman 

A paper published in 1937 [2] suggested that the consilience of a number of 
sets of ranks can be tested by computing a statistic designated xf • A mathe- 
matical proof by S. S. Wilks demonstrated that the distribution of Xr approaches 
the ordinary distribution as the number of sets of ranks increases. The 
rapidity with which this limiting distribution is approached was investigated by 
obtaining the exact distributions of Xr for a number of special cases. It was 
concluded that ^‘when the number of sets of ranks is moderately large (say 
greater than 5 for four or more ranks) the significance of xr can be tested by 
reference to the available x tables” [2, p. 695]. The use of the normal distribu- 
tion was recommended when the number of ranks in each set is large, but the 
number of sets of ranks is small, although no rigorous justification of this pro- 
cedure was presented. 

Except for the few special cases for which exact distributions were given, the 
paper did not provide a test of significance for data involving less than six sets of 
ranks and a small or moderate number of ranks in each set. This important 
gap has now been filled by M. G. Kendall and B. Babington Smith [1]. In 
addition, they furnish a somewhat more exact test of significance for tables of 
ranks for which the earlier article recommended the use of the x distribution. 

Kendall and Smith use a different statistic, W, defined as Xr divided by its 
maximum value, m{n — 1), where n is the number of items ranked, and m the 
number of sets of ranks.^ The new statistic (independently suggested by W. 
Allen Wallis [3] who terms it the rank correlation ratio and denotes it by lyj) is 
thus not fundamentally different from Xr • A more radical innovation is the 
improvement in the test of significance that they suggest. Instead of testing 
Xt by reference to the x^ distribution forn — 1 degrees of freedom, Kendall and 
Smith, generalizing from the first four moments of W, recommend that the 
significance of W be tested by reference to the analysis of variance distribution 

(Fisher^s z-distribution) with z = ^log^f-^ == (w — 1) — - ,n 2 = 

2 \ 1 — Ir / m 

(m — l)j^(^ — 1) — For small values of m and n, they introduce con- 

1 The author is indebted to Mr. W. Allen Wallis for valuable criticism and to Miss Edna 
R. Ehrenberg for computational assistance. 

* This is Kendall and Smith's notation which will be used in the present paper. The 
original paper [2] designated the number of items ranked by p, and the number of sets of 
ranks by n. 
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tinuity corrections, substituting for W 




7n*(n* — n)’ 


the statistic 


S — 1 _ m^(n* — n) 

m^(n» - n ) „ 14. 24 

12 m*(n> - w) 


where S is the observed sum of squares of the deviations of sums of ranks from 
the mean value, m(n + l)/2. Comparison with exact distributions of W (or S) 
for special cases indicates that this test yields very good approximations to the 
correct probabilities. 

In the limit the two tests of significance are identical. Neglecting the 

correction for continuity, « = ^ ^og. ( — s Ws = 

2 \m(n — 1) — Xr/ 2 \n — 1/ 

(m — 1 ) (n — 1 ) — - — ► * , and ni = (n — 1) — - — ♦ (ra — 1) as m — » «> . For 
L wij m 

Tii = <», the analysis of variance distribution is identical with the di.stribution 
2 

of j log, — . The difference between the two tests is thus that one, x*) uses 

Til 

a single (limiting) distribution for all values of m, whereas the other, 2, adapts 
the distribution to the value of m. 

The necessity of taking into account the value of m, while it increases the 
flexibility of the distribution, makes the z test somewhat less convenient in 
practice than the Additional computation is required to obtain the 

values of Ui and 712 , and to make (be continuity corrections. It is also fairly 
laborious to test the significance of the result, if exact values of z at any level of 
significance are required. In these instances, two-way interpolation of recip- 
rocals in the analysis of variance tables is necessary since both rii and are 
always fractional. These difficulties make it desirable to investigate the rapidity 
with which the significance levels given by the z test approach those given by the 
X* test, and thus determine the range of values of m and n for which the simpler 
test can safely be employed. This investigation will yield as a by product the 
.05 and .01 significance values of x? (or W or S) for selected values of m and n as 
determined by the z test. 

Table I presents a summary comparison of the values of Xr at the .05 and .01 
levels of significance as shown by (1) exact distributions, (2) the z test with 
continuity corrections, ( 3 ) the x test.* The significance values are expressed in 
terms of Xr rather than W because, for a given number of ranks per set (i.e., a 
given n), the significance values given by the x tost are the same regardless of the 
number of sets of ranks (i.e., of the value of m). This would not be so if W 
were employed, since W = xJ/w (w ~ !)• The expected value of W depends on 


* The values of x? computed using the z test that are given in Tables I and II were ob- 
tained with the afd of Fisher and Yates’ Table V [4]. Linear interpolation of reciprocals 
was employed throughout. 
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m and approaches zero as m — > « while the expected value of x* is equal to n — 1 
for all values of m. 

The values given by the 2 test agree remarkably well with the exact values. 
With but two exceptions (the .01 values for w = 3, m = 8 and 10) the exact 
value differs very much less from the value given by the 2 test than from the 
value given by the x* teat. In all but three of the 12 comparisons, the 2 test 
gives a value below the correct one.* 

TABLE I 


Comparison of Values of at -05 and .01 Levels of Significance Yielded by Exact 
Distributions, 2 Test with Continuity Corrections, and x* Test 




.05 Level of Significance 

.01 Level of Significance 



From Exact 

From 


From Exact 

From 




Distribution 

z test 


Distribution 

z test 


n 

m 



with 

conti- 

nuity 

correc- 




with 

conti- 

nuity 

correc- 


Limits 

In- 

terpo- 

lated 

From 
X* test 

Limits 

In- 

terpo- 

lated 

From 
X* test 




value* 

tions 



value* 

tions 


3 

8 

5.25-6.26 

6.16 

6.012 

5.991 


9.00 

8.35 

9.21 


9 

6.0 -6.22 

6.17 

6.004 

5.991 


8.67 

8.44 

9.21 


10 

5.6 -6.2 

6.08 

5.999 

6.991 

8.6 - 9.6 

9.04 

8.51 

9.21 


00 



5.991 

6.991 



9,21 

9.21 

4 

4 

7.5 -7.8 

7.54 

7.43 

7.82 

9.3 - 9.6 

9.42 

9.21 

11.34 


5 

7.32-7.8 

7.54 

7.62 

7.82 

9.72- 9.96 

9.87 

9.66 

11.34 


6 

7.4 -7.6 

7.49 

7.67 

7.82 


10.00 

9.95 

11,34 


00 



7.82 

7.82 



11.34 

11.34 

5 

3 

8.27-8.53 

8.41 

8.59 

9.49 

9.87-10.13 

10.05 

10.08 

13.28 


00 



9.49 

9.49 



13.28 

13.28 


* Computed by linear interpolation of probabilities. 


Table II gives for a very much larger number of values of m and n the .06 
and .01 values of xJ computed on the basis of the 2 test with continuity correc- 

* These comparisons duplicate some of those made by Kendall and Smith and merely 
serve to confirm their conclusion that the 2 test with continuity corrections gives exceed- 
ingly good results. 

The values obtained using the 2 test without continuity corrections agree less well with 
the exact values than those obtained with the aid of the continuity corrections. However 
even if no continuity corrections are made the 2 test in general yields values closer to the 
exact values than does the x* test. 



TBSTS or SIQNmCANOB FOB PBOSLBll OF m BAMEmOB 


TABLE II 


Values of x* at -06 and .01 Leveh of Significance Computed on the Basis of Kendall 
and Smitii’s z test, vnth Continuity Cotredions; .10, .075, .02, .015 Values of x* 


m 

n 

3 

4 

5 « 1 

1 7 

Values at .06 Level of Significance 


3 



8.59 

9.90 

11.24 

4 


7.43 

8.84 

10.24 

11.62 

5 


7.62 

8.98 

10.42 

11.84 

6 


7.57 

9.08 

10.54 

11.97 

8 

6.012 

7.63 

9.18 

10.68 

12.14 

10 

5.999 

7.67 

9.25 

10.76 

12.23 

15 

5.985 

, 7.72 

9.33 

10.87 

12.36 

20 

5.983 

7.74 

9.37 

10.92 

12.42 

100 

5.987 

7.80 

9.46 

11.04 

12.56 

90 

5.991 

7.82 

9.49 

11.07 

12.59 

xM.io) 

4.605 

6.25 

7.78 

9.24 

10.64 

X* (.075)* 

5.18 

6.90 

8.49 

10.00 

11.45 


Values at .01 Level of Significance 


3 



10.08 

11.69 

13.26 

4 


9.21 

10.93 

12.59 

14.19 

5 


9.66 

11.42 

13.11 

14.74 

6 


9.95 

11.74 

13.45 

15.09 

8 

8.36 

10.31 

12.13 

13.87 

15.53 

10 

8.51 

10.52 

12.37 

14.11 

15.79 

15 

8.74 

10.79 

12.67 

14.44 

16.14 

20 

8.85 

10.93 

12.82 

14.60 

16.31 

100 

9.14 

11.26 

13.19 

14.99 

16.71 

00 

9.21 

11.34 

13.28 

15.09 

16.81 

X* (.02) 

7.82 

9.84 

11.67 

13.39 

15.03 

x> (.016)* 

8.40 

10.46 

12.34 

14.09 

15.77 


* Computed from Fiaher and Yatea’ Table IV (4) by linear interpolation between the 
logarithma of the probabilities. 
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tions. The values entered for m = « are obtained from x* tables forn — 1 
degrees of freedom and are the significance values by the x test for all values of 
m. It is apparent that as m increases the .01 and .05 values of Xr approach their 
limiting values very rapidly. For n = 7, two-thirds of the difference between 
the .05 values for m = 3 and m = « , and an even larger proportion of the 
difference between the .01 values, disappears by the time m == 10; and the 
situation is similar for the other values of n. Except for the .05 values for n = 3, 
the approach to the limit is monotonic from below. The use of the x* test thus 
tends to lead to the overestimation of the significance values and of the probabili- 
ties attached to observed values of x? • It is clear, however, that for large and 
even moderate values of m the test is, for all practical purposes, equivalent 
to the z test. 

In order to determine more precisely the range of values of m and n for which 
the approximation given by the x test is adequate, it is necessary to adopt some 
convention about the error in estimated significance values of Xr that is tolerable. 
Since the conclusion drawn from an observed x* depends on the probability 
that it will be exceeded by chance, this convention clearly should be expressed in 
terms of the error in the probability. 

The structure of published x tables makes it convenient to accept an estimated 
probability between .10 and .05 as a tolerable approximation to a correct prob- 
ability of .05, and an estimated probability between .02 and .01 as a tolerable 
approximation to a correct probability of .01. These ranges of tolerance are 
entirely on one side of the correct probability because, as pointed out above, the 
error in using the x^ test is consistent in direction. These ranges are purely 
arbitrary, of courses, and many may think them too broad. 

On the basis of this or some similar convention it is possible to make objective 
statements concerning the range of values of m and n for which the x test is 
adequate. The next to tlie last line in the first section of Table II gives the .10 
values of x] the next to the last line in the second section, the .02 values. All 
the .05 values of Xr shown in the table exceed the .10 value of x^- Using the x 
test, all of the values (with two exceptions for n = 3) would signify a probability 
greater than ,05 but less than .10. Thus the error made at the .05 level is 
within the admissible range according to the suggested convention. The x^ 
test is therefore an adequate substitute for the z test at the .05 level for all 
values of m and n except possibly for a few of the values for which exact dis- 
tributions are available. 

As might be expected, the x test is less satisfactory at the .01 level. For 
values of m less than six, the .01 values of Xt eomputed using the z test with 
continuity corrections are less than the .02 value of x^- For m greater than 5, 
the values of Xr in the table would all be accorded a probability greater than .01 
but less than .02 if the x^ test were employed. As already noted, this is the range 
of values of m for which the original paper suggested the x test could validly be 
used [2, p. 695]. 

In view of the arbitrary nature of the convention as to the permissible error 
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in the probability attached to an observed value of xf > it is interesting to in- 
vestigate the effect of an alternative and stricter convention, namely, that only 
probabilities from .076 to .05 and from .015 to .01 be accepted as approximations 
to correct probabilities of .05 and .01 respectively. The .075 and .015 values of 
X* are given in the last lines of the two sections of Table II. On the basis of this 
convention the x* test is adequate at the .06 level for m greater than three, and 

TABLE III 


Vaiuea of Sat .05 and .01 Levels of Significance Computed on the Basis of Kendall 
and Smith’s z test, with Continuity Corrections 


m 

n 

Additional values for 
n sa 3 


3 

4 

5 

6 

7 

m 

1 ^ 

Values at .05 Level of Significance 

3 



64.4 

103.9 

157.3 

9 

54.0 

4 


49.5 

88.4 

143.3 

217.0 

12 

71.9 

5 


62.6 

112.3 

182.4 

276.2 

14 

83.8 

6 


75.7 

136.1 

221.4 

335.2 

16 

95.8 

8 

48.1 

101.7 

183.7 

299.0 

453.1 

18 

107.7 


60.0 

127.8 

231.2 

376.7 

671.0 



15 

89.8 

192.9 

349.8 

570.5 

864.9 




119.7 

258.0 

468.5 

764.4 

1158.7 




Values at .01 lievel of Significance 


3 



75.6 

122.8 

185.6 

9 

76.9 

4 


61.4 

109.3 

176.2 

265.0 

12 

103.5 

5 


80.5 

142.8 

229.4 

343.8 

14 

121.9 

6 


99.5 

176.1 

282.4 

422.6 

16 

140.2 

8 

66.8 

137.4 

242.7 

388.3 

679.9 

18 

158.6 

10 

85.1 

175.3 

309.1 

494.0 

737.0 



16 

131.0 

269.8 

475.2 

758.2 

1129.5 



20 

177.0 

364.2 

641.2 

1022.2 

1521.9 




at the .01 level for m greater than nine, except possibly for a few of the values 
for which exact distributions are available. Thus even so drastic a lowering of 
the permissible margin of error as halving it limits only slightly the range of 
values of m for which the x* test is adequate. 

Table II provides, of course, a direct means of testing the significance of 
observed values of x? lor the tabled values of m and n. For this purpose, how- 
ever, Table III, giving the significance values of jS is more useful, since it obviates 
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the necessity of converting S into xJ • For n — Z Table III includes a few 
values of m in addition to those in Table II. 

SUMMARY 

The preceding analysis suggests that the x‘ test of the significance of x* 
(or W or iir), while less accurate than the z test proposed by Kendall and Smith, 
is adequate for practical purposes at the .01 level of significance if the number of 
sets of ranks (m) is greater than 5; and at the .05 level for any number of sets of 
ranks, provided the number of ranks in each set (n) is more than 3. Exact 
distributions are now available for n = 3, m = 3 to 10; n = 4, m = 3 to 6; 
n = 5, m = 3 [1]. The .06 and .01 values of x* and S, computed using the 
Kendall and Smith z test with continuity corrections, are given in Tables II 
and III of the present note forn = 3 to 7 and selected values of m from 3 to 100. 
For n greater than 7 and m less than 6, the z test with continuity corrections 
should be employed. For all other combinations of n and m not covered by the 
exact distributions or by Tables II and III, the x^ test is adequate. 
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NOTES 

This section is devoted to brief research and expository articles, notes on methodology 
and other short items. 


NOTE ON AN APPROXIMATE FORMULA FOR THE SIGNIFICANCE 

LEVELS OF Z 

Bt W. G. Cochran 


1. Introduction. An important part has been played in modem statistical 

analysis by the distribution of z = | log , when s* and a* are two independent 

^2 

estimates of the same variance. In particular, all tests of significance in the 
analysis of variance and in multiple regression problems are based on this 
distribution. Complete tabulation of the frequency distribution of 2 is a heavy 
task, because the distribution is a two-parameter one, the parameters being the 
number of degrees of freedom, ni and n* in the estimates s? and 4 • Thus each 
significance level of e requires a separate two-way table. Fisher constructed a 
table of the 5 percent points in 1925 [1], and this has since been extended by 
several workers [2] to the 20, 1, and 0.1 percent level for a somewhat wider range 
of values of tii and nj . 

With his original table, Fisher gave an approximate formula for the 5 percent 
values of z, for high values of ni and nj outside the limits of his table. The 
formula reads: 

(1) z (5 percent) = ■ — 0.7843 (— — , 

Vh-1 \«i ”*/ 

V. 2 1,1 

where t = — I — . 

h ni ns 


The constant 1.6449 is the 5 percent significance level for a single tail of the nor- 
mal distribution, and the constant 0.7843 will be found to be i{2 -f (1.6449)*|. 
Thus the general formula for the significance levels of z derivable from (1) is 


X 



where a: is a normal deviate with unit standard error. By inserting the appro- 
priate significance level of x, this formula has been extended [2] to the tables of 
the 20, 1, and 0.1 percent levels of z and commonly appears with all published 
tables of z. The objects of this note are to indicate the derivation of the 
formula and to suggest an improvement upon it in the latter cases. 
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2. The transformation of the 0-4i8tribution to normality. For high values 
of Ui and nj , the distribution of z approaches the normal distribution, the 
principal deviation being a slight skewness introduced by the inequality of m 
and n 2 . It is therefore natural to seek an approximate formula for the distri- 
bution of z by examining its relation to the normal distribution. For the 
^•distribution the ratio kt/k^^ , where Kt is the cumulant, is of the order 
where n is the smaller of Hi and th • This property is common to a large 
number of distributions which tend to normality; for example, the distribution 
of the mean of a sample of size n from any distribution with finite cumulants. 
Fisher and Cornish [3] have recently given a method, applicable to all distribu- 
tions with this property, for transforming the distribution to a normal distri- 
bution to any desired order of approximation. They also obtained explicit 
expressions for the significance levels of the original distribution in terms of the 
significance levels of the normal distribution, discussing the z-distribution as a 
particular example. The relation between z and the normal deviate x at the 
same level of probability was found to be 


(2) z 


X 

\/A 


+ 2 ) 


A _ A 1_ + 3 x 

\ni rhJ^^/hX 12 /i 


+ -144- I* 


(---))■ 

\ni 712/ ) 


the three terms on the right hand side being respectively of order rr\ n and 
n~*, so that terms of order are neglected.^ 

If this equation is compared with equation (1), the latter appears at first 
sight to be the ap proxim ation of order tG the z-dlstribution, except that tlui 
divisor of x is \/h — 1 in (1) and y/h in (2). Computation of a few values 
shows that at the 5 percent level, equation (1) is the better approximation. For 
example, for ni = 40, nj = 60, (1) gives z (5 percent) = .2334, (2) gives .2309, 
and the exact value is .2332. 

Since 


X 

\/h — 1 


Vh 


+ 


2h\/% 


+ terms of order n 


Fisher’s approximation differs from (2) by including a correction term of order 
n”*. Inspection of the true correction terms of this order in equation (2) shows 

X* + llx r- / \ l\^ 

that for finite values of Wi and nj the term — — v A ( ) is consider- 

144 \ni m/ 

“4“ 3x 

ably smaller than the term — ; — , since the former has a smaller numerical 

12AvA 

coeflScient and involves the difference between -i and — . Thus Fisher’s 

ni n2 

formula gives a close approximation to the true formula of order n'“*, provided 

X x^ “1“ 3x x^ I 2 

that ^ is approximately equal to — ; i.e. if — ^ — is approximately equal 


> Fisher and Cornish also gave the two succeeding terms. 
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I O 

to 1. For the 5 percent level, x = 1.6449, and — - — =» 0.951. Thus at the 

o 

5 percent level the use of -\//i — 1 in (1) instead of y/h extends the validity of 
Fisher’s approximation from order n“^ to order n~*. 

This ingenious device, however, requires adjustment at other levels of sig- 
nificance. The values of {x^ + 3)/6 at the principal significance levels are 
shown below. 


Significance level — % 

40 

30 

20 

10 

5 

1 

0.1 

X = (x’ + 3)/6 

0.61 

0.65 

0.62 

0.77 

0.96 

1.40 

2.09 


If y/h — 1 in formula (1) is replaced by y/h — X, with the above values of X, 
Fisher’s formula will be approximately valid to order tT^ at all levels of signifi- 
cance. In particular, for the tables already published of the 20, 1 and 0.1 
percent points, X may be taken as 0.6, 1.4 and 2.1 respectively. The values of z 
given by the use of \/fe — 1 and — X are compared below for ui = 24, 

nt = 60." 


Significance Level 

Approximate formula 

Exact value 

Vh-l 

Vh-\ 

20% 

.1346 

.1337 

.1338 

1% 

.3723 

.3748 

.3746 

0.1% 

.4876 

.4966 

.4955 


The use of y/h — X gives values practically correct to 4 decimal places» 
except for the 0.1 level of significance, at which the higher terms become more 
important. 

With the aid of this formula, complete tabulation of the ^^-distribution for a 
given pair of high values of Ui and th is relatively simple. If very low proba- 
bilities at the tails are required, the further approximations given by Fisher and 
Cornish [3] may be used. 
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’ The numerical terms in the approximate formula given for the 20 percent points on p. 28 
of Fisher and Yates’ Statistical Tables are in error. Their formula should read: 


z 


0.8416 
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A NOTE ON THE ANALYSIS OF VARIANCE WITH UNEQUAL CLASS 

FREQUENCIES* 

By Abraham Wald* 

Let us consider p groups of variates and denote hy nij (J = 1, • • • , p) the 
number of elements in the j-th group. Let a:,-,- be the f-th element of the j-th 
group. We assume that a:,-,- is the sum of two variates ta and if,’ , i.e. a:,-,- = 
tii + Vi , where ui (i = 1, • • • , wi,- ; j = 1, • • • , p) is normally distributed with 
mean n and variance a*, and Vi (j = L • • • » P) is normally distributed with 
mean n' and variance o-'*. All the variates e,-, and if,- are supposed to be dis- 
tributed independently. 

The intraclass correlation p Ls given by* 


Confidence limits for p have been derived only in case of equal class frequencies, 
i.e. mi = JWj = • • • = JWp . In this paper we shall deal with the problem of 
determining the confidence limits for p in the case of unequal class frequencies. 

ff'* 

Since p is a monotonic function of — , our problem is solved if we derive confi- 

ff'* 

dence limits for — j . 

Denote by ij the arithmetic mean of the j-th group, i.e. 


( 1 ) 


X,’ = 



m,- 


+ Vi- 


Hence the variance of Xj is equal to 

(2) al, = a'*. 

' Wi 

Denote — by Then we have 


(3) 



2 

a 

Wj 


n'he author is indebted to Professor H. Hotelling for formulating the problem dealt 
with in this paper. 

* Research under a grant-in-aid from the Carnegie Corporation at New York. 

* See for instance R. A. Fisher, Statistical Methods for Research Workers^ 6-th edition, 

p. 228. 
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where 

(4) 

Now we shall prove that 

(5) 


mi 


I '-l I -^4 


^ 1P/V 


/-I 


has the x*-distribution with p — 1 degrees of freedom. Ijet 

yi=y/wjXi ...,p) 

and consider the orthogonal transformation 

Vi = Uiyi, • • • , yp), 

y n-i = Lp-iiyi, ,yf), 


Vv)i + ••• + 


Wp 


where Li{yi , , y,,), • • • , Lp-i(yi , ■■■ ,yp) denote arbitrary homogenous 

linear functions subject to the only condition that the transformation should 
be orthogonal. 

Since the mean value of y, is equal to ■\/wj (n + y') and the variance of y,- 
is equal to <r*, we obviously have: The mean value of y'j (j = 1, • • • , p — 1) 
is equal to zero, the variance of yj (j = 1, • • • , p) is equal to <7*. In order to 
prove our statement, we have only to show that the expression (5) is equal to 

3 iy'i + • • • + y'p-i)- If substitute in (5) for Xj , we get 






\ 


J 


i-i 


(60 


, (Ev^»,)’ (SViiw)’ 

r i_,W, Z^Wi 


(E V^,y,y 


E«-, 

(yi* + • • • + y'p-i)- 


1 r 

^ a /2 

1 

^ ^ fi /a 

«r*L; 

L Vi - Vp 

-1 J 

c^b 

LVi - Vp 
-1 
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Since — —~n — ^ has the x* distribution with N — p degrees of freedom, the 


expression 

( 6 ) 


p - i !sx(xii - 


has the analysis of variance distribution with p — 1 and N — p degrees of 
freedom, where N = my + m,, . In case mi = mt = ••■ = nip ^ m, 

we have 


(60 

where x 
Hence 


P = 


N -pPi m 
p - 1 -f mX“ 


SSxi,- 

~N 


and F* = 


N — p mSixj 
p - 1 SS (x<,- 


X,)2' 


1 

] +mX* 


f’*, 


X‘ = 



If Fi denotes the lower and Fj the upper confidence limit of F, v/v obtain for X* 
the confidence limits 




Let us now consider the general case that mi , ■ • • , Wp are arbitrary positive 
integers. First we shall show that the set of values of X*, for which (6) lies 
between its confidence limits Fi and Fj , is an interval. For this purpo.se we 
have only to show that 


/(xO ^ 



twiJj 


is monotonically decreasing with X*. In fact 


d/(xO 

dX* 

Since 


we have 



d/(xO 

dX» 



iw7) 


< 0 , 


which proves our statement. 
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Hence the lower confidence limit xf of X* is given by the root of the equa- 
tion in X*: 




(7) . _ 

p - 1 2S(a!<,- - ifY 

and the upper confidence limit x| of X* is given by the root of the equation in X*: 


( 8 ) F = Fi. 

Smee /(X*) is monotonically decreasing, the equations (7) and (8) have at 
most one root in X*. If the equation (7) or (8) has no root, the corresponding 
confidence limit has to be put equal to zero. If neither (7) nor (8) has a root, 
we have to reject at least one of the hypotheses: 

(1) Xij — -f- 1)j . 

(2) The variates «„• and ij, (i = 1, • • • , m, ; j = 1, • • • , p) arc normally and 
independently distributed. 

(3) Each of the variates has the same distribution. 

(4) Each of the variates ij/ has the same distribution. 

The equations (7) and (8) are complicated algebraic equations in X*. For 
the actual calculation of the roots of these equations, well known approximation 
methods can be applied making use also of the fact that the left members are 
monotonic functions of X*. In applying any approximation method it is very 
useful to start with two limits of the root which do not lie far apart. We shall 
give here a method of finding such limits. 

Denote by P the function which we obtain from F (formula (6)) by substi- 
tuting 

“ 1 , •”.?)• 


Let / be the function obtained from / by the .same proeess. 

Denote by <p{m, X*) the function which we obtain from P by substituting m 
for Zi , • • • , Zp . We shall first show that P is non-decreasing with increasing 

dP 

Ik {k = 1, • ■ ■ ,p), i.e. -y > 0. For this purpose we have only to show that 


^>0. We have: 
dlk 



Hence our statement is proved. Denote by m' the smallest and by m" the 
greatest of the values mi, • • • , . Then we obviously have 
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(9) ^ F ^*)- 

Denote by \j*, x('“, Xt*, Xj * the roots in X* of the following equations respectively: 
vim', X*) = Fi ; 
vim”, X*) = F, ; 

vim',\^==Ft; vim”,\^) = F,. 

Since F is monotonically decreasing with increasing X*, on account of (7), (8), 
and (9) we obviously have 

xl* < X? < xr 

and 

< X| < \P. 

The above inequalities give us the required limits. 

CoLTiMBiA University, 

New York, N. Y. 


THE DISTRIBUTION OF QUADRATIC FORMS IN NON-CENTRAL 
NORMAL RANDOM VARIABLES 

By William G. Madow’ 

The following theorem is the algebraic basis of the theorem of R. A. Fisher 
and W. G. Cochran which states necessary and sufficient conditions that a set 
of quadratic forms in normally and independently distributed random variables 
should themselves be independently distributed in x^'-distributions.* 

Theohem I. If the real quadratic forms qi , • • • , qm , in Xi, ,Xn, are 
such that 

(1) Z gv = Z , 

7 f 

and if the rank of Qy is ny , then a necessary and sufficient condition that 

(2) ffv = Z , 

^ The letters t, j, n, v will assume all integral values from 1 through n, the letter y will 
assume all integral values from 1 through m, (w ^ w), the letter a will assume all integral 
values from ni -h • • • + -f 1 through ni + • • • + , (no « 0, ni -f- • • • + n* * n'), 

the letters jS, will assume all integral values from 1 through n', and the letters r, s will 
assume all integral values from 1 through n — 1. 

* The references are: W. G. Cochran, ‘‘The Distribution of Quadratic Forms in a Normal 
System, with Applications to the Analysis of Covariance,” Proc, Camb. Phil, Soc.f Vol. 
30 (1934), pp, 178-191, and R. A. Fisher, “Applications of ‘Student's' Distribution,” 
Metran, Vol. 6 (1926), pp. 90-104. 
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where the retd linear functions of the Xp are defined by 

(3) aJr * £ CpfiZfi 
is 

(4) n' — n. 

Furthermore the system of linear forms (3) constitute an orthogonal tran^ormation. 

Pboof; Necessity. Since the rank of a sum of quadratic forms is less than 
or equal to the sum of their ranks, it follows that n' > n. Upon substituting 
from (3) for the x’s in (1), and using (2), it is seen that, for all values of the z'b, 

* Cp0Cyfif) ZfiZfif 

and hence, from (1), it follows that 

(5) £ CpfiCpftf = dfifif 

p 

where 600 * = 0, if /3 5 ^ /S', and 5^^/ * 1 if /9 * jS'. However, since the rank 
of the system of linear forms (3) is not greater than n, and since the matrix 
of (5) is the product of the matrix of (3) by its transposed matrix, it follows 
that (5) can be true only if n' is not greater than n. Consequently n' ~ n. 
It then is an immediate result of (5) that the transformation (3) is orthogonal. 

Sufficiency. We assume that n'. = n. By a real linear transformation of 
Xi , • • • , ajn we obtain linear forms Zp such that 

gy = £ CaZa f 
a 

where c* = 1 or — 1. The set of linear functions 2 i , • • • , «„ are linearly inde- 
pendent, for if Zn ^ 0, and if real numbers h , • • • , h^i not all zero, exist such 
that, say, 

«n = 2 Kzr 
r 

then 

Z) = £ H,ZrZ, . 

P r,9 

Substituting, we have 

£ = £ £ Hr,c''‘c"x^x, 

y p r,t it,p 

where z, = £ c'"*,, . (It is not assumed here that the matrix of the c"' is the 

inverse of the matrix of the c„, . That fact is a consequence of this pro^ 
Denoting the matrix of Zi , • • • , Zn-i by Cn we see that the matrix of £ is 

CnHCn where H is the matrix of the H„ and has rank less than or equal to n — 1 
which contradicts the hypothesis. Hence if C is the matrix having the elements 
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c, in its main diagonal and zeros elsewhere and if Cn is the matrix of zi , 
it follows that 


dec. = 1 , 


tn 


where I is the identity matrix, i.e. the matrix having ones in the main diagonal 
and zeros elsewhere and Cn non-singular. Then C = and hence C is 

the identity matrix and C* is orthogonal. 

Among the hypotheses of the Fisher-Cochran theorem is the hypothesis that 
the mean value of is 0, and the variance of x,, is ff’. However, in connection 
with his analysis of the distribution of the multiple correlation coefficient,* 
R. A. Fisher derived the distribution of the sum of the squares of n independently 
distributed random variables xi , • • • , x„ , the probability density of being 
given by 

(6) p(x„) = (2ir(r*)“* exp j^— ^ (x„ — a„)*J . 

More recently, P. C. Tang,* has used the distribution of the sum of non-central 
squares in his study of the power function of the analysis of variance test. 

In this note we extend the Fisher-Cochran theorem to non-central random 
variables. If the random variables x„ are independently distributed with 
probability densities given by (6), Fisher and Tang have shown that if x'’‘ — 

4 £ » then the probability density of x’^ is given by 

<r , 


(7) 


P(x'*) 




T 

1^0 1>1 r(in + v) 


where X = £ oj . 

2<r r 


We now give necessary and sufficient conditions that a set of quadratic forms 
in normally and independently distributed random variables should themselves 
be independently distributed in x'*-distributions. 

Theokisu II. Let xi , x„ be independently distributed random variables, 
the random variable x„ having probability density (6). Denote 2 Qi ond 


denote ^ hy X. 


Let qi f • * • f Qm j he quadratic formSf 


Qy 


23 ^ 


such that 23 let the rank of qy be denoted by Uy . 


’ R. A. Fisher, ^The General Sampling Distribution of the Multiple Correlation Coeffi- 
cient/^ Proc, Royal Soc, of London^ (A), Vol. 121 (1928), pp. 664-673. 

^ P. C. Tang, ‘The Power Function of the Analysis of Variance Tests with Tables and 
Illustrations of their Use,’^ Statistical Research Memoirs^ Vol. 2 (1938), pp. 126-149. 
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A necessary and sufficient condition that the quadratic forms Xyt 
be irtdependenUy distributed vnth joint prcbabUity density 

(8) p(xi*, 

where p(xt } is given by (7) with % and \y in place of n and X, and 

( 9 ) hy = ~J^al7\a, 

isn' <= n. 

Proof. Necessity. Tang* has shown that the distribution of x'* is given 
by (7) and that if the Xy have joint distribution (8), then the distribution of 
Xi* + • • • + Xm, (= x'*)i is (7) with n' in place of n. Upon comparing terms, 
we see that n' = n. 

Sufficiency. By Theorem I there exist n orthogonal linear functions (3) such 
that (2) is true. Then it is easy to see that the random variables Zi , ■ , Zn 

arc independently distributed with a joint probability density 

(10) p(«i , ■ • . , z.) = (2r(r*)-‘“ exp [- J L (*, - o,')*], 

P 

where 

Oil 2^ O). f and ai. . 

P P P 

If we set 2or*XY = 2 «« i then we have, from (7) and (10), that the Xt are 

a 

indeptmdently distributed with joint probability density (8). It is only neces- 
sary to show that Ea? - E in order to complete the proof of the 

« Mt*’ 

theorem. Now 

d^p di dj • 

li,P i*3 tt,P 

On the other hand, by direct substitution for the wc see that 

Qy * (53 ^yttt^pct) XfiXp 

a ihP a 

and hence Om?’ = Hc^c„. Since (1) is an orthogonal transformation, 

a 

53 ciftCjp = 53 ( 5 !) Cutfipet) Ci^c/F = 53 > 

II, P iitP a a 

where Sat = 0, if o i and = 1 if « = t, which completes the proof. 

It is emphasized that the form of X., makes it unnecessary to calculate the 
matrix of qy to determine X^ since the values a, need only be substituted for the 
X, in the original expression for qy to determine X^ . 

WASraNOTON, D. C. 



• See 4 p. 140. 
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TWO PROPERTIES OF SUFFICIENT STATISTICS 


By Louib Olshevsky 

The concept of sufficient statistics was introduced by R. A. Fisher in 1922. 
It was refined and extended in 1936 by Neyman and Pearson who gave defini- 
tions of shared sufficient statistics and sufficient sets of algebraically independent 
statistics.^ Today the concept plays an important part in the theory of the 
subject. Characterized briefly, a statistic associated with a single or specific 
population parameter is sufficient when no other statistic calculated from the 
same sample sheds any additional light on the value of the parameter. We 
shall prove that sets of sufficient statistics possess certain interconnections so 
that when one set is known every other set with a like number of members and 
linked with the same population parameters is discoverable. 

Theorem 1. If Ti j • • • ^ Tm are a set of m(m ^ n) algebraically independent 
sufficient statistics with regard to the parameters , • • • , and the probability law 
p(xi , • • • , Xn 1 Si , • • • , , * • • , ^z), a necessary and sufficient condition for the 

sufficiency of any set of m algebraically independent statistics Ti , • • • , Tm with 
regard to the same parameters and the same probability distribution is that the Ti 
be a set of independent functions of the Tj {ij j = 1, * • • , 

Proof: As an adjunct in the demonstration we cite the following theorem 
due to Neyman.^ For a set of algebraically independent statistics 7\ , . • , 
to be a sufficient set with regard to the parameters , • • • , , it Ls necessary 

and sufficient that in any point of sample space, except perhaps for a set of 
measure zero, it should be possible to present the probability law in the form 
of the product 

■■■ ,d,, ■■■ ,e,) 

~ Pi'^ I ) ■ ■ ■ > I I ■ ■ ■ t ^«) ‘^(^l ) ■ ■ ■ ( ) ®«+l ) • • • ) 


where p(Ti , ■ ■ ■ , Tm \ 0i , ■ • • , 9,) is the probability law of Ti , ■ ■ ■ ,T„ and 
the function <t> does not depend upon 6i , ■ ■ ■ ,6, . 

The sufficiency of the condition stated in the hypothesis of Theorem I is now 
immediately evident. For, if p' and 4>' refer to the second set of algebraically 
independent statistics and r< = T<(Ti , • • • , Tm) where the functions are inde- 
pendent, the relations can be solved for the Tj in terms of the Ti giving 
r, = T,iT [ , . • . , r:), p'{T[ ,■■■ ,9,) 

= p[TiiTi Tl), . . . , T„{,T[ ,■■■ ,Ti)\9i,... ,9,] 


^ (^1; • ‘ * J • • • J 9l) * • • • , XftJ ^9+1, 


diTl, ■■■,TLy 
a(ri, ....r, ,) 


* See Neyman and Pearson: ‘'Sufficient Statistics and Uniformly Most Powerful Test* 
of Statistical Hypotheses,” Staiistical Research Memoirs of the University of London, June 
1936. The notatian of the present paper is taken from this article. 

•See Neyman^s article in the Oiornale delV Jnsituto Italiano degli Attuari, Vol. VI, 
No. 4 (1935) as well as the memoir referred to in footnote 1. 
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and 


( 2 ) 


Vixi, 


I ‘ > • • • > ®j) 

“ P > ■ ' ■ > I • > ®«) (®1 » • • ■ > I ^ e +1 ) • ■ ' > 


Proof of the necessity is somewhat more involved. Since the T,- and Ti are 
both sets of algebraically independent statistics with regard to 6i , 

equations (1) and (2) are satisfied. They are, in fact, identities when the 
values of Ti , • ■ • , Tn and Ti , • • • , T* in terms of the x,- are substituted. 
Division of (1) by (2) and multiplication leads to the equation 


(3) 


pjTl, • • • , Tm\di, ■ • • , , Xn', 6t+l> - • • , Ol) 

P'i^h • • • , Tm I fli, •••,#*) ^(*1, • • • , Xn", 6 q+i, • • • , Sj) 


The right aide of (3) is free of 9i ,■■■, 6, . Therefore, in reality the left side 
must be too. If some or all of the parameters di , • ■ ■ ,6^ enter formally into 
the left side, we can choose m + 1 sets of values Sj , • ■ • , 9 J (i = 1 , • • • , m + 1) 
such that each of the m + 1 functions p(2’i , • • • , T„\ d\ , • ■ • , $1) -5- p'{T [ , 
• ■ ■ , Tm\ 6l , ■ ■ ■ , 0l) differs formally from all of the others. We can, then, 
since each is equal to the right .side of (3) which is free of 0i Oq , equate 
any one of these functions to the remaining m in turn. This provides m inde- 
pendent equations whose very existence proves that the Ti are functions of the 
Tj and vice versa. 

If none of the parameters 0i , • ■ • ,0q enters formally into the loft .side of (3), 
p{Ti , ,Tm\0i, ■■■ ,0q) must be of the form p(7’, , • • • , T„)g{0i , • ■ • , 0q) 

and p'iT'i , ■ ■ ■ , T'„ \ 0i , ■ ■ • , 0q) of the form p'{T [ , • • • , Tl,)g{0i , • • ■ , 0q). 
In this case the original probability law p(xi , • • ■ , Xn \ 0i , • • • , 0q , • • • , 0i) 
contains 0i, ■ ■■ ,0q only nominally and there can be no talk of any statistics 
de.signed to estimate these parameters either singly or in combination. 

When m = 1 and the set of algebraically indepeifdcnt statistics reduces to 
one, the single statistic is termed a shared sufficient statistic of the parameters 
01, ■■■ ,0q.* For this special case. Theorem I can be restated as follows. If 
T is a shared sufficient statistic with regard to the population parameters 
01 , ■ ■ ■ ,0q and the probability distribution p(xi , • • • , Xn | Si , • > • , , • • • , 0i), 

the necessary and sufficient condition for the sufficiency of any statistic T' 
with regard to the same parameters and the same probability distribution is 
that T' be a function of T. When m and q both equal one, the statistic becomes 
a sufficient statistic in the sense originally defined by Fisher in 1922. 

A physical law is independent of the coordinate ssrstem used to express it. 
This fact is taken account of in modem physics through the employment of 
tensors. One might hope for a parallel situation in the relation between suffi- 
cient statistics and the probability law to which they refer. Given any I 
parameter family of distribution laws p(xi , ■ • • , Xn 1 Si , • • • , Si), the substitu- 


' See the memoir mentioned in footnote 1. 
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tion fli » $i($l , • • • , (t =* 1, • • • , 0 leads to the equally valid representation 
of the family 

p'(zi , - ,Xnl 0 i, ,e',) 

- PlXj, ■■■ ,Xn I , • • • , 0'l), , 0l(0'l , , 9l)]. 

Is a set of statistics sufficient with respect to the first representation also suffi- 
cient with respect to the second? The answer is partly in the affirmative and 
is given by the following proposition. 

Theorem II. 1/ the set of algd>raically independent statistics Ti , • ■ • , T„ is 
sufficient mth regard to the parameters 6i , • ,0q and the probability law 

p(xi , ■ • • , Xn\ 01 , • ■ • , 0q , • ■ ■ , 0t), it is also sufficient with regard to 0i , ■ ■ • , 0q 
and any other representation p'(xi , • ^ • ,Xn\ 0i , • • ■ , 0q , • • • , 0i) of the same 
probability law provided (i = 1, • • • , g) are independent functions of 0i ,•••, 0q 
only and fly (j = ? + 1 , • • • , 1) a,re functions of fl^i , • • • , flj only. 

Proof: The proof of the theorem is obvious. We are given the fact that 
p{xi ,x„ \ 01, ■■■ ,0q, ■■■ ,0i) = p(7’i ,T„\ 01, ,0q) •<^(xi , • • • , 

> fl«+i > • • • » fli)- Since the o't (i = !,•••, ?) are functions of fli , • • • , flg 
only and the fly (i = 9 + 1, • • • , 0 are functions of flg+i , • • • , flj only, it follows 
that fly = fly(flj ' , ,0q) (i = 1, • ■ • , g) and fly = fly(fli+i , • • • , fli) (j = 

q + 1, • • • ,1). Consequently, 

,,, P'(*l , ■ ■ ■ , Xn \ 0i , ■ ■ ■ , 0q , ■ ■ ■ , 0i) 

(4) 

~ P'i'i'l $ ’ ' ' $ 1 fli ) • • • » fl«) ’0 ( 2:1 ) • • • , Xn ', 0q+l I , fly) 

and the theorem is established. 

New York, N. Y. 

NOTE ON THE MOMENTS OF A BINOMIALLY DISTRIBUTED VARUTE 

Bt W. D. Evans 

J. A. Joseph, has given two interesting triangular arrangements of numbers, 
the second of which is reproduced herewith as Table 1.* The successive rows 
in this table are the coefficients in the expansion of x" as a function of the fac- 
torials using the notation of the calculus of finite differences. For example, 

x‘ = x<« + 6x“> + 7x«> + X, 

where 

x‘‘’ == x(x — l)(x — 2) • . • (x — t + 1). 

Joseph points out that the coefficients may be used to generate the niunbers 
of Laplace. 

' J. A. Joseph, “On the Coeffioiente of the Expansion of Annals of Math. Stnt., 
Vol. X (1989), p. 298. 
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A general expression defining any of the coefficients in terms of its place df 
occurrence in Table 1 may be set up. If we denote by Fe(r) the number in 
row r and column c of the table, we have 

r-e 4 'l 111 ^ i fe-t 

(1) K(r) 2 fci 2 Jfei 2 *>» • • • *«-i (»■ ^ c). 


This expression is of additional interest since the numbers defined by it are 
likewise the coefficients in the expression of the 2 -th moment about the origin 
of a binomially distributed variate in terms of the probability of the variate and 
the size of the sample in which it is contained. For example, it may be easily 

TABLE 1 

1 2 3 4 5 c 


1 1 

2 1 1 

3 1 3 1 

4 16 7 1 

5 1 10 25 15 


1 


F,(r) F,(r) F,(r) F 4 (r) 


Fi(r) ••• F,(r) 


verified that if a is such a variate, p its probability of occurrence, and n the size 
of the sample in which it is contained, 

i?(a)* = n**V* + np 

E(«)* = -h Zn^^^p* np , 

E{a)* ~ -f- 6n^**p* -h 7»^*'p* "h wp 

and so on. 

Ordinarily, computation of the higher moments of a binomially distributed 
variate is a tedious process of repeated differentiation. However, equation (1) 
immediately permits us to generalize the foregoing expressions to give the z-th 
moment of a as follows: 

(2) E(a)* = £ p*"*’ £ ki £ fe • • • £' h • 

<-o 

It will be noted that when c — 1 in equation (1) and i in equation (2) are equal 
to zero, the repeated summations vanish to be replaced by the value one. 

By means of equation (2) much of the labor usually involved in expressing 
the 2 -th moment about the origin of a binomially distributed variate in terms 
of n and p may be avoided. 

Wabhinoton, D. C. 



REPORT OF THE ANNUAL MEETING OF THE INSTITUTE 

The fifth annual meeting of the Institute of Mathematical Statistics was 
held in Philadelphia, Pennsylvania, on December 27 and 28, 1939, in conjunc- 
tion with the meetings of the American Statistical Association, the Econometric 
Society, and the American Sociological Society. The program for the meeting 
was arranged by Professor C. C. Craig. 

On Wednesday morning, December 27, the Institute held a session devoted to 
contributed papers on Statistical Theory and Methodology. Professor P. R. 
Rider, President of the Institute, presided. At that time the following papers 
were presented: 

1. On the unbiased character of certain likelihood^ratio tests when applied to normal 

systems. 

Joseph F. Daly, The Catholic University of America. 

2. The product seminvariants of the mean and a central moment in samples. 

C. C. Craig, University of Michigan. 

3. A method for minimizing the sum of absolute values of deviations. 

Robert Singleton, Princeton Local Government Survey. 

4. On certain criteria for testing the homogeneity of k estimates of variance. 

C. Eisenhart and Frieda S. Swed, University of Wisconsin. 

5. On a test whether two samples are from the same population. 

A. Wald and J. Wolfowitz, Columbia University and Brooklyn, New York. 

6. The power functions of certain tests of significance in harmonic analysis and lag cor- 
relation, 

William G. Madow, Washington, D. C. 

7. Some theoretical aspects of the use of transformations in the statistical analysis of rep- 
licated experiments. 

W. G. Cochran, Iowa State College. 

8. The standard errors of geometric and harmonic types of index numbers. 

Nilan Norris, Hunter College. 

9. A study of R, A, Fisher^ s z distribution and the related F distribution. 

L. A. Aroian, Hunter College. 

10. A note on the analysis of variance with unequal class frequencies. 

Abraham Wald, Columbia University. 

1 1 . A n approach to problems involving disproportionate f requencies . 

Burton D. Seeley, U. S. Department of Labor. 

Abstracts of these papers are given at the close of this report. 

Immediately following the session just described, the Institute held its annual 
business meeting. At that time President Rider announced that the newly 
elected officers for the year 1940 are: President, 8. S. Wilks, Princeton Uni- 
versity; Vice-Presidents: C. C. Craig, University of Michigan, and A. T. Craig, 
University of Iowa; Secretary-Treasurer: P. R. Rider, Washington University. 

At one o'clock on the same day, members of the Institute and their guests 
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attended the annual luncheon. At the luncheon, Professor B. H. Camp ad- 
dressed the Institute on Non-standard Deviaiions, 

On Wednesday afternoon, the Institute met jointly with the American Statis- 
tical Association for a program devoted to Lag Effects in Statistics and Eco- 
nomics. Professor J. D. Tamarkin presided and at this time the following 
papers were read: 

1 . Lag effects in statistics and related problems, 

A. J. Lotka, Metropolitan Life Insurance Company. 

2. Some methods in the analysis of lag effects, 

H. T. Davis, Northwestern University. 

3. Lag effects in economics, 

Charles F. Roos, Institute of Applied Econometrics, Inc. 

A joint session with the Biometric Section of the American Statistical Associa- 
tion was held on Wednesday evening, Professor George W. Snedecor presiding. 
The papers presented at this session, which dealt with Design and Analysis of 
Replicated Experiments^ were the following: 

1 . Practical difficulties met in the use of experimental designs. 

A. E. Brandt, Soil Conservation Service. 

2. Factorial design and covariance in the biological assay of vitamin D, 

C. I. Bliss, Sandusky, Ohio. 

3. Combinatorial problems in the design of experiments. 

Gertude M. Cox, Iowa State College. 

4. Experimental trials with balanced incomplete blocks, 

W. J, Youden, Boyce Thompson Institute. 

On Thursday afternoon the Institute held consecutively joint sfjssions with 
the American Sociological Society and the Econometric Society. At the first of 
these, Professor William F. Ogbum presided and the following program was 
presented: 

1. How the mathematician can help the sociologist. 

Samuel A. Stouffer, University of Chicago. 

2. Some problems of combinations and permutations as they apply to a comprehensive 

classification of social groups. 

George A. Lundberg, Bennington College. 

Discussion: C. C. Craig, University of Michigan. 

Philip M. Houser, U. S. Bureau of the Census. 

At the second session the topic for discussion was Recent Advances in Business 
Cycle Analysis and these papers were given : 

1. Recursive methods in business cycle analysis. 

Merrill M. Flood, Princeton Surveys. 

2. An appreciation of some recent mathematical business cycle theories. 

Gerhard Tintner, Iowa State College, 

3. The statisticians* new clothiers. 

Arne Fisher, Western Union Telegraph Company. 


Paul R. Rider, Secretary. 



ABSTRACTS OF PAPERS 

(Presented on December 27, 1939, at the Philadelphia meeting of the Institute) 

On the Unbiased Character of Certain Likelihood-Ratio Tests when Applied to 
Normal Systems. Joseph F. Daly, The Catholic University of America. 

Consider a random sample of N observations on a set of variates x*, • • • , x*, where 
* • • , are assumed to bo normally distributed about means which are linear functions 
m* - of the fixed variates • • • , One is sometimes required to decide whether 
the sample tends to contradict the further hypothesis, Ho , that the coefficients belonging 
to a certain subset of the fixed variates, say x^^, • • • , x^\ have the specific values 5^o • 
Such a situation occurs, for example, in the generalized analysis of variance. In this paper 
it is shown that the Neyman-Pearson method of the ratio of likelihoods yields a test of Ho 
which is (at least locally) unbiased; in other words, this test is less likely to reject Ho when 
the sample is in fact drawn from a normal population in which b^ » &^o than when it is drawn 
from a normal population in which the b^ are different from but sufficiently close to 6 jo • 
In the special cases it 1 or /i « 1 the proof goes through even without the restriction that 
the true be close to 6jo, a result which is also implicit in the papers by P. C. Tang and 
P. L, Hsu (Stat. Res. Mem. Vol. 2). 

Similarly with respect to the hypothesis Hr that the deviations x* — fall into 

certain mutually independent sets the X-test is at least locally unbiased; and it has the 
additional property that the expected value of any positive integral power of is greater 
when is true than when the sample is drawn from any other normal population. 

The Product Seminvariants of the Mean and a Central Moment in Samples. 

C. C. Cbaig, The University of Michigan. 

The method used by the author in calculating the product seminvariants of a pair of 
central moments in samples is not adapted without modification to the present problem. 
In the present paper the necessary modification is developed which gives a routine method 
for the calculation of these sampling distribution characteristics. The calculation is a 
little heavier than in the previous case but the results for the mean and the second, third, 
and fourth central moments are given up to the fourth order except in one case in which the 
weight is 13. It is planned to follow this with a further study of the distribution of Fisher's 
( in samples from a normal population. 

A Method for Minimizing the Sum of Absolute Values of Deviations. Robert 
Singleton, Princeton Local Government Survey. 

E. C. Rhodes (Philosophical Magazine, May 1930) presented a method for the estimation 
of parameters in a linear regression where it is desired to minimize the sum of absolute 
values of the deviations. In this paper the structure of the deviation surface is analyzed 
and a method of steepest descent is developed which for computational purposes is an 
improvement over Rhodes’ method. The process is finite and leads to an exact solution. 
The method and the formulae used are such os to permit the successive additions of new 
observations or sets of observations to the original data, or the exclusion of an observation 
from the original set, and the determination of the parameters for the sets of data so de- 
rived, with little additional labor. 


no 
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On Certiiii CriterUi for Testiaig tho Homogeneity of k Estimatefl of VuiMiee. 

C. Eibbnhabt and Frieda S. Swed, University of Wisconsin. 

Given k variance eatimatee aj , aj , • • • , a* with Mra* , (r — 1, 2, • • • , it), independently 
distributed as x'v* for n, degrees of freedom, tests of the hypothesis, Ht , that v* — v’, 
(r > 1, 2 , • • • , k), where v* is unknown, have been based to date on one or the other of the 
quantities 

• 

0, - ^ «r(a; - a‘)V2a‘ 


Qf w log (na*/w) *0r log 

where the Wr are weights, to » tOr , n — ^ nr , and ns* » n,s’ . A. E. Brandt and 

W. L. Stevens have advocated the use of Qi , referring an observed value of Qi to the x* 
distribution for A; — 1 degrees of freedom. J. Neyman, E. 8. Pearson, B. L. Welch, and 
M. 8. Bartlett have advocated tests based on Qt , Bartlett definitely proposing the use of 
degrees of freedom as weights, i.e. « ttr , and recent work of E. J. G. Pitman and others 
has shown that unless Wr nr tests based on Q% are biased. (A statistical test of an hypoth- 
esis U is said to be unbiased when the probability of rejecting H by its use is a minimum 
when H is true ; obviously a desirable property.) When Wr « nr Bartlett has suggested that 

the distribution of Qt can be satisfactorily approximated by referring Qj/| 1 + ^ 

to the X* distribution for ib ~ 1 degrees of freedom. In this paper we discuss 

the adequacy of the x* distribution to describe the distribution of Qi and of the adjusted 
Q% when the degrees of freedom. Hr , are small. 

U. 8. Nair and D. J. Bishop have given theoretical evidence which suggests that when 
Wr ^ 2, (r - 1, 2, • • • , k)f Bartlett’s adjusted Qt may be expected to conform to the x* 
distribution reasonably well in the neighborhood of the 5% and 1% levels. Using 1000 
samples of 4 for which nr«5/(nr+i) has been tabulated by W^ A. Shewhart in Table D, Ap- 
pendix II of his ^‘Economic Control of Quality of Manufactured Product,*' 200 values of 
Qi and Qt (with adjustment) were calculated and compared with the x* distribution for 
k — 1 degrees of freedom. Two cases were studied: Case I, k 5 and ni » ni • >■3; 
Case 11, » 3 and ni » ng » 3 while ng - 9. As measured by the Chi-Square Goodness of 

Fit Test, using 11 degrees of freedom, the fits were good in all four instances. In Case I, 
for Bartlett’s adjusted Qt the test led to .80 < P < .90, and to .70 < P < .80 for the Brandt- 
Stevens Qi ; in Case II, the fits were poorer with .60 < P < .70 for Bartlett's criterion and 
.10 < P < .20 for the Brandt-Stevens. However, an examination of the descending cumula- 
tive distributions showed that in all instances these criteria exhibited a deficiency of large 
values of x*, with the deficiency, in general, more marked in the case of the Brandt-Stevens 
test. Consequently, when one uses significance levels for these criteria obtained by means 
of the x* approximation advocated, one is in reality using a level of significance slightly 
leas than that professed. The discrepancy is not great, however, and is on the safe side, i.e. 
one will rej ect H o falsely in the long run less often than one professes to be doing. Without 
doubt, however, one will also detect the falsehood of Ho when for at least one pair 

of values of r and ^ r less often in the long run by the use of these approximate signifi- 
cance levels than if the true levels were used, but we have no definite evidence at present 
on this point. A somewhat disquieting feature is that the agreement between the x* values 
yielded by the two criteria becomes worse as one proceeds toward larger values of x* in 
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terms of either quantity. Thus, of 8 samples which Q 2 would have rejected at the 5% level 
in Case I, only 4 of these would have been rejected by Qi , and Qa would have passed 3 
samples of the 7 rejected by Qi . Thus it appears that, if one wishes to work with a given 
chance of rejecting Ho falsely, one should choose one of these criteria and then stick to it in 
future applications. For large values of the Hr the two criteria tend to equivalence, so the 
choice between them is of interest mainly for small tir , but cannot be made with full in- 
formation until more is known about the bias, if any, of the Brandt-Stevens test, and the 
relative power of the two tests with regard to alternatives to Ho . 


On a Test Whether Two Samples are from the Same Population. A, Wald 
AND J. WoLFOWiTZ, Columbia University and Brooklyn, New York. 


Let X and V be two independent random variables about whose distributions nothing is 
known except that they are continuous. Let Xi , xa , •• • , Xm be a set of m independent 
observations on X and let , ^ 2 , • • • , be a set of n independent observations on V, 
The null hypothesis to be tested is that the distributions of X and V are identical. 

Let the set of m -H « observations be arranged in order of magnitude, thus: Zi , za , * • • , 
2w+n . Replace Zi by Vi (? * 1, 2, • • • , m + n) where ~ 0 if 2 , is a member of the set of 
x’b and =* 1, if 2 * is a member of the set of y's. Since the null hypothesis states only that 
the distributions of X and Y are identical without specifying them in any other way, the 
distribution of the statistic U used for testing the null hypothesis must be independent of 
this common distribution of X and Y. It can easily be shown that the statistic U must be 
a function only of the sequence I’l , Va , • • • , . 

A subsequence v , , v,^i , • • • , Vt^r (where r may also be 0) is called a run if v, -* 

• • • « VM^.r and if i/.-i 7 ^ v» when « < 1 and if i when s -f r < m -f n. The 

statistic U defined as the number of runs in the sequence vi , 1 ^ 2 , • • • , Vm+n seems a suitable 
statistic for testing the null hypothesis. A difference in the distribution functions of X 
and Y tends to decrease U. Hence the critical region is defined by the inequality f/< Uo , 
where Uq depends only on m, n, and the level of significance adopted. If m < n and 
P{U * c) is the probability that U ^ then: 


P\U - 2 K] 




(X - 1, 2, • • • , m), 


P\U^2K- 1 ) 


m+np 

v/m 


The mean of 1/ is: 


The variance of 17 is: 


2mn 
m + n 


+ 1 . 


(A « 2, 3, • • • , m “H 1). 


2mn(2mn -- m — n) 
(m + n)*(m + n — 1) ’ 


If — » a (a positive constant) and w , the distribution of U converges to the normal 
n 

distribution. 


The Distribution of Quadratic Forms In Non>Central Normal Random Vari- 
ables. WiLLUM G. Madow, Washington, D. C. (Presented to the Institute 
under a slightly different title) 
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Let the distribution of a sum of non-oentral squares of normally and independently dis- 
tributed random variables which have the unit variances be called the x** distribiftion. 
It is proved that if a set of quadratic forms have a sum which is the sum of tho squares of 
their variables^ then a necessary and sufficient condition that the quadratic forms be inde- 
pendently distributed in distributions is that the rank of the sum of quadratic forms be 
equal to the sum of the ranks of the quadratic forms. Furthermore, the constants on which 
the x'* distributions depend may be obtained by substituting the values about which the 
variables are taken for the variables themselves in the quadratic forms. Roughly speaking 
the theorem states that if a set of quadratic forms satisfy the conditions of the Fisher- 
Cochran theorem when the true means vanish, then the set of quadratic forms will be 
independently distributed in x'^ distributions when the true means do not vanish. 

Some Theoretical Aspects of the Use of Transformations in the Statistical 
Analysis of Replicated Experiments. W. G. Cochran, Iowa State College. 

The device of transforming the data to a different scale before performing an analysis of 
variance has recently been recommended by a number of writers for replicated experiments 
in which the original data show a markedly skew distribution. The use of transformations 
to obtain an approximate analysis has been supported mainly on the grounds that in the 
transformed scale the true experimental error variance is approximately the same on all 
plots. This paper considers the relation of the method of transformations to a more exact 
analysis. Discussion is confined to the \/S and sin'^ \/x transformations, which appear 
to receive the most frequent use in practice. 

To obtain an exact analysis, it is necessary to specify (t) how the expected value on any 
plot is obtained from unknown parameters representing the treatment and block (or row 
and column) effects (u‘) how the observed values on the plots vary about the expected 
values. If the latter variation follows the Poisson law, (a case to which the square root 
transformation has been considered appropriate), the equations of estimation by maximum 
likelihood take tho form 



where x is the observed and m the expected value on any plot, c is a typical unknown para- 
meter, and the summation extends over all plots whose expectations involve c. As the 
number of parameters is usually large (e.g. 16 in a 6 x 6 Latin square), these equations are 
laborious to solve ; moreover, the question of obtaining small-sample tests of significance is 
difficult. It is shown that if a particular form can be assumed for the prediction formula 
in (t), namely that \/m is a linear function of the treatment and block (or row and column) 
constants, the equations of estimation may be reduced to the simpler form 

(2) 4(r' - Vm) - 0, 

e 

where ^ ^ ) is a function closely related to the square root of x. It follows 

that the statistical analysis in square roots, with some slight adjustments, coincides with 
the maximum likelihood solution, provided that the above form can be assumed for the 
prediction formula. The appropriateness of this form in practice is briefly considered and a 
^^goodness of fit'* test by x* is developed. A numerical example is worked as an illustration 
and indicates that a good approximation is obtained by the transformation alone even 
with very small numbers per plot. The corresponding theory is also discussed for the inverse 
sine transformation, which applies where the original data are percentages or fractions 
whose experimental errors are derived from the binomial distribution. 
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In practice the type of analysis outlined above is unlikely to supplant the simple use of 
transformations, because it can seldom be assumed that the experimental variance is 
entirely of the Poisson or binomial type. The more exact analysis may, however, be 
useful (i) for cases in which the plot yields are very small integers or the ratios of very 
small integers (n) in showing how to give proper weight to an occasional zero plot yield. 

The Standard Errors of Geometric and Harmonic Types of Index Numbers. 

By Nilan Norris, Hunter College. 

Various statisticians have made empirical studies of the sampling errors of certain types 
of index numbers used in the United States and England. None of these writers has taken 
advantage of the tools afforded by the modern theory of estimation, including fiducial 
inference, as a means of arriving at direct and general expressions for estimating the stand- 
ard deviations of the sampling errors of geometric and harmonic types of index numbers. 

A known expression for the first approximation to the variance of a function, as given by 
the relation between the variance of the function and the variance of the argument, is 
valid for that general class of distributions of which the variance and a higher moment 
are finite. With the aid of this relation, there appear simple and useful forms for estimat- 
ing the standard errors of geometric and harmonic types of indexes. For sufficiently large 
samples, these forms are valid for all of the types of distributions of price relatives, produc- 
tion relatives, and similar observations ordinarily encountered, provided that there are 
satisfied the necessary conditions for drawing sound inferences on the basis of sampling 
without reference to the value of the variate. 

Necessary conditions for using tests of significance soundly in connection with index 
number problems are those of realistic and intimate acquaintance with observations, and 
careful attention to certain broad theoretical considerations which determine whether or 
not the index is suited for the purpose for which it is used. 

A Study of R. A. Fisher’s z Distribution and the Related F Distribution. L. A. 
Aroian, Hunter College. 

The following results for the t distribution and related F distribution are investigated: 

(1) Geometric properties. 

(2) Exact values of the seminvariants and moments of z. Exact values of the first 
four central moments of F. 

(3) The approach to normality of both distributions as ni and ria become large in any 
manner whatever. 

(4) The Pearson types of approximating curves, the logarithmic normal approximation, 
the Gram-Charlier approximation, and the uses of these in finding any level of 
significance of z and of F. 

A Note on the Analysis of Variance with Unequal Class Frequencies. Abraham 
Wald, Columbia University. 

Let us consider p groups of variates and denote by my O' * 1, * * * , p) the number of 
elements in the j-th group. Let Xyy be the i-th element in the j-ih group. |We assume that 
Xii is the sum of two variates €<y and i|y , i.e. xa ■■ €<y -f- lyy where cy, (i • 1, • • • , my ; j ■« 
1, • • • , p) is normally distributed with mean m and variance and 0 - L * * * i P) i* 
normally distributed with mean p' and variance ir'*. All the variates eyy and lyy are supposed 
to be distributed independently. The intra-class correlation p is given by 

iri + ir'** 


P 
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Confidence limits for p have been derived only in case of equal class fi^equencies, i.e. jsi 
mt » • • • » m|i . We give here the confidence limits for p in case of unequal class frequen* 

cies. Since p is a monotonic function of -7 , it is sufficient to derive confidence limits for 


*7 . Denote -7 by X* and the arithmetic mean of the i-th group by i . Let 

O’* a* 


Wi 


1 + w/X** 


and denote by Fi and F 2 the lower and upper confidence limits respectively of F, where F 
has the analysis of variance distribution with p - 1 and Ar-~p»mi + *“ +mp-p 
degrees of freedom. Then the lower confidence limit Xj of X* is given by the root of the equa< 
tion in X*: 


( 1 ) 


/(X*) 



p - 1 XZizii - 


and the upper confidence limit \\ of X* is given by the root of 


F,, 


(2) /(X*) « F, . 

For calculating the roots of (1) and (2), we can make use of the fact that /(X*) is mono- 
tonically decreasing with increasing X^ 


An Approach, to Problems Involving Disproportionate Frequencies. Burton 
D. Seeley, Washington, D, C. 

Applied mechanics offers an analysis of variance solution to problems of multiple classi- 
fication involving disproportionate sub-class numlnjrs. The quality of orthogonality may 
be attained in such problems by measuring the variability between classes of any one 
classification after centering the others. This approach^ which is not limited by the num- 
ber of classes or the number of classifications, treats the problem involving equal sub-class 
numbers as a special phase of the general analysis of variance. 



CONSTITUTION 

OF TH£ 

INSTITUTE OF MATHEMATICAL STATISTICS 

ARTICLE I 
Name and Pubposb 

1. This orj^anization shall be known as the Institute of Mathematical Statistics. 

2. Its object shall be to promote the interests of mathematical statistics. 

ARTICLE II 

Membership 

1. The membership of the Institute shall consist of Members, Fellows, Honorary 
Members, and Sustaining Members. 

2. Voting members of the Institute shall be (a) the Fellows, and (b) all others who 
have been members for twenty-tliree months prior to the date of voting. 

ARTICLE III 

Officers, Board of Directors, Committee on Membership, and Committee on 

Publications 

1. The Officers of the Institute shall be a President, two Vice-Presidents, and a Secre- 
tary-Treasurer, elected for a term of one year by a majority ballot at'the annual meeting 
of the Institute. Voting may be in person or by mail. 

(a) Exception. The first group of Officers shall be elected by a majority vote of the 
individuals present at the organization meeting, and shall serve until December 31, 1936. 

2. The Board of Directors of the Institute shall consist of the Officers and the previous 
President. 

3. The Institute shall have a Committee on Membership composed of three Fellows. 
At their first meeting subsequent to the adoption of this Constitution, the Board of 
Directors shall elect three members as Fellows to serve as the Committee on Membership, 
one member of the Committee for a term of one year, another for a term of two years, 
and another for a term of three years. Thereafter the Board of Directors shall elect 
from among the Fellows one member annually at their first meeting after their election 
for a term of three years. The president shall designate one of the Vice-Presidents as 
Chairman of this Committee. 

4. The Institute shall have a Committee on Publications composed of three Members 
or Fellows elected by the Board of Directors. The President shall designate a Vice- 
President as Ex Officio Chairman of this Committee. 

ARTICLE IV 
Meetings . 

1. A meeting for the presentation and discussion of papers, for the election of Officers, 
and for the transaction of other business of the Institute shall be held annually at such 
time as the Board of Directors may designate. Additional meetings may be called from 
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time to time by the Board of Directors and dhall be called at any time by the President 
upon written request from ten Fellows. Notice of the time and place of meeting shall be 
given to the membership by the Secretary-Treasurer at least thirty days prior to the 
date set for the meeting. All meetings except executive sessions shall be open to the 
public. Only papers accepted by a Program Committee appointed by the President may 
be presented to the Institute. 

2. The Board of Directors shall hold a meeting immediately after their election and 
again immediately l)efore the expiration of thmr term. Other meetings of the Board 
may be held from time to time at the call of the President or any two members of the 
Board. Notice of each meeting of the Board, other than the two regular meetings, 
together with a statement of the business to be brought before the meeting, must be 
given to the members of the Board by the Secretary-Treasurer at least five days prior to 
the date set therefor. Should other business be passed upon, any member of the Board 
shall have the right to reopen the question at the next meeting. 

3. The Committee on Membership shall hold a meeting immediately after the annual 
meeting of the Institute. Further meetings of the Committee may be held from time to 
time at the call of the Chairman or any member of the Committee provided notice of such 
call and the purpose of the meeting is given to the members of the Committee by the 
Secretary-Treasurer at least five days before the date set therefor. Should other business 
be passed upon, any member of the Committee shall have the right to reopen the ques- 
tion at the next meeting. 

4. At a regularly convened meeting of the Board of Directors, three members shall 
constitute a quorum. At a regularly convened meeting of the Committee on Member- 
ship, two members shall constitute a quorum. 

ARTICLE V 

Publications 

1. The AwiaU of Mathematical Statistics shall be the Official Journal for the Institute* 
Other publications may be originated by the Board of Directors as occasion arises. 

ARTICLE VI 
Expulsion or Suspension 

1. Except for non-payment of dues, no one shall be expelled or suspended except by 
action of the Board of Directors with not more than one negative vote. 

ARTICLE VII 

Amendments 

1. This constitution may be amended by an affirmative two-thirds vote at any regu- 
larly convened meeting of the Institute provided notice of such proposed amendment 
shall have been sent to each voting member by the Secretary-Treasurer at least thirty 
days before the date of the meeting at which the proposal is to be acted upon. Voting 
may be in person or by mail. 

BY-LAWS 

ARTICLE I 

Duties or the Ofpicbrb, Board of Directors, Committee on Membership, and 

Committee on Publications 

1. The President, or in his absence, one of the Vice-Presidents, or in the absence of the 
President and both Vice-Presidents, a Fellow selected by vote of the Fellows present, 
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shall preside at the meetings of the Institute and of the Board of Directors. At meetings 
of the Institute, the presiding officer shall vote only in the case of a tie, but at meetings 
of the Board of Directors he may vote in all cases. At least three months before the date 
of the annual meeting, the President shall appoint a Nominating Committee of three 
members. It shall be the duty of the Nominating Committee to make nominations for 
Officers to be elected at the annual meeting and the Secretary-Treasurer shall notify all 
voting members at least thirty days before the annual meeting. Additional nomina- 
tions may be submitted in writing, if signed by at least ten Fellows of the Institute, up to 
the time of the meeting. 

2. The Secretary-Treasurer shall keep a full and accurate record of the proceedings 
at the meetings of the Institute and of the Board of Directors, send out calls for said 
meetings and, with the approval of the President and the Board, carry on the corre- 
spondence of the Institute. Subject to the direction of the Board, he shall have charge 
of the archives and other tangible and intangible property of the Institute. He shall 
send out calls for annual dues and acknowledge receipt of same; pay all bills approved 
by the President for expenditures authorized by the Board or the Institute; keep a 
detailed account of all receipt^ and expenditures, prepare a financial statement at the 
end of each year and present an abstract of the same at the annual meeting of the Insti- 
tute after it has been audited by a Member or Fellow of the Institute appointed by the 
President as Auditor. The Auditor shall report to the President. 

3. The Board of Directors shall have charge of the funds and of the affairs of the 
Institute, with the exception of those affairs specifically assigned to the President or to 
the Committee on Membership. The Board shall have authority to fill all vacancies 
ad interim, occurring among the Officers, Board of Directors, or in any of the Committees. 
The Board may appoint such other committees as may be required from time to time 
to carry on the affairs of the Institute. 

4. The Committee on Membership shall prepare and make available through the 
Secretary-Treasurer an announcement indicating the qualifications requisite for the 
different grades of membership. 

5. The Committee on Publications, under the general supervision of the Board of 
Directors, shall have charge of all matters connected with the publications of the Insti- 
tute, and of all books, pamphlets, manuscripts and other literary or scientific material 
collected by the Institute. Once a year this Committee shall cause to be printed in the 
Official Journal the Constitution and By-Laws and a classified list of all the Members 
and Fellows of the Institute. 


ARTICLE II 
Dubs 

1. Members shall pay five dollars at the time of admission to membership and shall 
receive the full current volume of the Official Journal. Thereafter, Members shall pay 
five dollars annual dues. The annual dues of Fellows shall be five dollars. The annual 
dues of Sustaining Members shall be fifty dollars. Honorary Members shall be exempt 
from all dues. 

2. Annual dues shall be payable on the first day of January of each year. 

3. The annual dues of a Fellow or Member include a subscription to the Official 
Journal. The annual dues of a Sustaining Member include two subscriptions to the 
Official Journal. 

4. It shall be the duty of the Secretary-Treasurer to notify by mail anyone whose dues 
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may be six months in arrears, and to accompany such notice by a copy of this Article. 
If such person fail to pay such dues within three months from the date of mailing such 
notice, the Secretary-Treasurer shall report the delinquent one to the Board of Directors, 
by whom the person’s name may be stricken from the rolls and all privileges of member- 
ship withdrawn. Such perran may, however, be re-instated by the Board of Directors 
upon payment of the arrears of dues. 

ARTICLE III 
Salaries 

1. The Institute shall not pay a salary to any Officer, Director, or member of any 
committee. 


ARTICLE IV 
Amendments 

1. These By-Laws may be amended in the same manner as the Constitution or by a 
majority vote at any regularly convened meeting of the Institute, if the proposed amend- 
ment has been previously approved by the Board of Directors. 
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UBimNO DISTEUBUTIONS OF QUADRATIC AITD BUINSAR 

Bt William G. Madow 


i.i 


1. Introduction. In a previous paper [15], several generalizations of the 
theorem of Fisher, [6, p. 97] and Cochran, [2, p. 178] on the joint distribution of 
quadratic forms in normally and independently distributed random variables 
were derived. The chief purpose of this paper is a demonstration that the 
Fisher-Cochran theorem and its generalizations are valid in the limitundercon* 
ditidns completely analogous to those under which the Laplace-Liapounoff 
theorem holds. Applications to the analysis of variance, periodogram analysis 
and multivariate analysis are discussed. 

Our general procedure will be to find algebraic conditions on the matrices of 
quadratic and bilinear forms which enable us to assert that the limiting distribu- 
tions of these forms are those which they would have had if the variables, the 
squares or products of which appear in their canonical forms, had been normally 
and independently distributed.* One thing which makes this possible is the 
fact that many frequently used quadratic and bilinear forms have the same 
rank no matter what may be the number of variables of which they are func- 
tions. For example, the rank of the square of the arithmetic mean, in , where 

in i (xi -b ... ■+• *»), 

1 % 


is one for all values of n. In this case the quadratic form. 


1 ** 
fr n,iN-i 


X^Xpf 


is a function of the n variables xi , xj , • • > , . 

In paragraph 2 we state the vector form of the Laplace-Liapounoff theorem 
and several corollaries. The joint limiting distributions of quadratic and 
bilinear forms are derived in paragraph 3. The final paragraph is devoted to a 
statement of a few applications of the theorem^. 


^ Much of this research was done under a grant-in-aid from the Carnegie Corporation of 
New York. 

* The material contained in this paper was presented in part to the American Statistical 
Association, December 28, 1937, and in pKrt to the Institute of Mathematical Statistics, 
December 27, 1938. 

’ We shall be chiefly concerned with conditions under which the limiting distributions 
are not themselveB normal. If the limiting distributions are normal, then generally under 
the conditions we state, the Laplace-Liapounoff theorem will have been directly applicable. 
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2. TIm Li^lace-Liapoaiioff theorent.^ We shall first state some definitioiM 
and terminolo^ which will be used throughout the paper. 

If used as subscripts or superscripts, or as indices of summation or multiplica- 
tion, the letters i, j will take on all integral values from 1 through p, the letters 
M, r will take on all integral values from 1 through n, the letters y, d wUl take on 
all integral values from 1 through tn, the letter a will take on all integral values 
from 1 through k, and the letter 0 will take on all integral values from 1 through 
k-1, unless explicit statement to the contrary is made. 

The totality of all sets of v real numbers will be denoted by R’. Thus S' is 
the combinatory product of the spaces R^, ■ ■ ■ , R^, (r times). 

If xi , • • • , Xn are random variables, and if A is a proposition concerning 
••• ,x», then by P{A} we shall mean “the probability that A.” The 
distribution function of the random variables Xi , • • • , Xn will be denoted by 
F{xt , • • • , X,), i.e. 

••• ,x*) = P|xi < X?, ... ,x„ < x®„} 

for all sets of n real numbers. Thus F will have an operational meaning in 
this paper. 

If A(xi , • • • , aJn) is a function of , • • , defined on R"" and measurable^ 
with respect to F{xi , • • « , Xn), then • , Xn)} will be defined by the 

equation, 

E[A{Xif ) ^n) i ~ / A(Xi, ••• fXn)dF(,XXf ••• , Xn)| 

' Jr* 

where the integral is a Lebesgue-Stieltjes or Radon integral. Hence 
1 A(xi , • . . , Xn) I is assumed to be integrable with respect to F{xi , • • • , Xn). 

If ^(yi 9 * f Vp) is a single valued measurable function of j/i , < • • , on 

R^, and if is a real single valued Borel measurable* function of Xi , • • • , Xn 
on fl**, then upon substituting for yi , • • • , yp it is seen that Q{yi , • • • , tfp) 

^Although the theoreraa will be stated in terms of probability distributions, Borel 
measurability, and Lebesgue-Stieltjes integrability, it may simplify the reading if the 
words '^probability distributions’’ are replaced by probability densities or statistical 
distributions, "Borel measurability’’ are replaced by continuity, and "Lebesgue-Stieltjes 
integrability’’ are replaced by Riemann integrability. 

* A function A(xi , ... , x«i) defined on is said to be measurable with respect to a distri- 
bution function F(xi , ... , Xn) if the set E(t) of all xi , ... , Xn such that A(xi , ... ,Xn)<t 

is such that / dF(xi , . . . , Xn) is defined for all i. 

JjtU) 

* All subsets of R* which may be formed from the totality of intervals of R^ by repeated 
summations or multiplications of not more than a denumerable number of intervals of 
R^f and A* itself, constitute the totality of Borel sets of R^, The function y(xi , . . . , x»), 
defined on R^^ is a Borel measurable function of Xi , ... , Xn on if the set of values of 
Xi , ... , Xn such that y(xi , . . . , Xn) < f is a Borel set for all t. The class of continuous 
functions is contained in the class of Borel measurable functions. For further details, 
see (3, chs. 1, 2], [11, ch. 8) and [17, ohs. 1, 2, 8]. 
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is a singl^valued measurable function, A(»i , • • * , aJn) of aji , * • * , Xn on . 
If ) * * * f are random variables, then , « * • , are random varialideei 
and^ 

(2.1) E{Qiy^ , . . . , y,)} « If |A(xi , • • • , OJ.)}. 

We shall call E(zi) the mean value of Xi , <r<,- the covariance of Xi and a?/ , 
and ^ii or a] the variance of Xi , where «■</ » E{(xi — Ex^{xi ^ 

The Laplace-Liapounoff, or Central Limit theorem states conditions under 
which linear functions of random variables have a normal limiting distribution. 
The general characteristic of the proofs of the theorem is that conditions are 
placed on the random variables so that they may virtually be assumed to be 
bounded. The Lindeberg* condition, which we shall use, is perhaps the least 
restrictive of all the conditions which require finite means and variances. 

The Lindeberg condition*, £p : A set of random variables Xi,n will be said to 
satisfy the Lindeberg condition £p if there exists, for any preassigned positive 
real numbers 5 and e, a positive integer no such that if n > n© , then 

I Zpn dF(^Xipfi I * • • , Xppn) < £, 

where 

“ Xun + Xipn 4“ * * * “1“ ^pi»n 

and 

(Uln + o\tn + • • • + (Tinn ** 1. 

If 


Xipn ® — “ where ^ 4 * • • • 4 * <^int 

Sin 

and the Xip^ satisfy £p then we shall say that the xtp satisfy £p . 

Suppose that the random variables 2 / 11 , • • • , Vpmp have a normal multivariate 
distribution with zero means and with covariance parameters where 

ffiyn = E{yiyyii)j 7 « 1 , • • • , ; 8 = 1 , . . . , m,- , 

and denote the distribution function of yn, . . , yp^ by N{y). Then we may 
state the Laplace-Liapounoff theorem as: 

’ It is noted that Q(yi , . . . , i/p) is integrated with respect to F(vi , . . . , yp) and 
A(«i , . . . , Zn) is integrated with respect to F(xi , . . . , Xn)* 

* See Cramer (3, pp. 57, 60, 114], and the references there given. 

* It is not difficult to show that the Lindeberg condition will be satisfied if moments of 
order greater than two exist, (3, p. 60], or if the conditions stated by Levy (18, p. 207] 
and [14, p. 106] are satisfied. 
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Thbobbm I. Suppose Otat, for each value of n, the random variaUea • 
which are independ^ for different values of v, have zero means and covarianee 
parameters atytt,* > where 

Denote by the maximum of the variances aiyiy,n . If the functions ytyn are 
defined by the equations 

y<yn 23 Xiyn > 

P 

it follows that 

Oiyitn = Eiyiy»yi$n) = 23 Otyitim. 

p 

If lim ffiyiin = (Tiyii and if lim dn = 0, then a necessary and sufficient condition 

n-*«o n-*« 

that as n —* oo, the limiting distribution'^ of yii„ , • • • , ypn^n be N(y) is that the 
condition be satisfied. 

The proof of this theorem is omitted. It may readily be developed from the 
proofs of Cramer, [3, pp. 57, 113]. 

Before stating certain corollaries which are of interest, some additional 
definitions are necessary. 

Let Cb , Cp+i , • • • be a sequence of m rowed real matrices 

Cn = II Cy,n II, n = m, m + 1, • • • , 

and let the greatest of the absolute values of the elements of C* be denoted by 
dn . The inner product of any two rows of C* will be denoted by pyin , i.e. 

Pytn ~ 23 CyrnCtm. 

Let Xi , Xj , • • • be a sequence of random vectors of p components defined 
on Rf, and let the components of Xp be denoted by , • • • ,Xjpt. Let the 
components of the chance matrix Fb = || yiyn || which has p rows and m columns, 
be defined by the equations 

(2.2) yiyn = 23 CynXin 

P 

for each value of n, (n = m, • • • ; m > p). 

The distribution functions F{Xn) will be said to converge to the distribution function 
F(X) if and only if " 

lim r dFiXn) - F(X) 

X .,0 

for every X at which F(X) is continuous. If F(X) is continuous throughout /f”, then the 
convergence is uniform. 
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Suppose that 

(2.3) E(x4,) - 0 
and 

(2.4) E(jCbXii^ ™ ) 

where s= 1 if m = v and =» 0 if n 9 ^ p. (There should be no confusion of 
this use of the letter S with its use as an index.) It is easy to see that if the 
Cy,n are real numbers, then 

= 0 

and 

EiyiynVfln) — <rijPyin . 

Let the determinant of the positive definite symmetric matrix, (o') = || an || 
be denoted by a. Let the inverse matrix of (<r) be dejioted by (»)“* = H II 
where <r*^ is the cofactor of an in («r) divided by a. The determinant of ((r)~‘ 
is ff”*. 

By Ndixi ,•••,*,; (<r)) we shall mean the normal probability density with 
zero means and covariance parameters an , i.e., 

Ndixi, •••,»»; (<r)) = (2T«r)“* exp [-i a^^XiXf ] , (-«<*<< 00 ), 

t.i 

where (a) is a positive definite matrix. If the random variables xi , > • • ,Xp 
have probability density Nd(X ; (v)) * ^^d(xl , • ■ • ,Xp; (a)), where X is a vector, 
then wc shall say that X has a distribution function N(X; (a)), i.e. 

N(X; (a)) =^Nd(X; (a)) 

• • • o30p 

or 


/ Xp i.*l 

• • • / N d{t \ , 

00 J—ao 


tp-,ia))dti...dtp^N{X]{a)), 


Inasmuch as certain hypotheses will be used on several occasions in this 
paper, they are stated here. 

If , xs , • • • are independently distributed, if (2.3) and (2.4) hold and if 
the x’s satisfy the condition £, then we shall say that DC, is true. 

If Cn is such that, for all n, the equations pyi„ = 6 yt are true, we shall say 
that € is true. 

The following corollary is useful in deriving limiting distributions in the 
analysis of variance. 

CoRROLLABY I. Let Xp and C be true. Then a sufficient condition that 
lim F(Yp) =* n Niyiy, •••tVpy) W) 

is lim dn — 0. 
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The proof is based on the fact that the xty,n of Theorem I are by . 
The details are omitted. 

The pm rowed square matrix, (t) = |1 t«, ]| is defined as follows: If r w, 
8 < m; then tm *= ffiiPr. ; and U km < r < {k + l)m, Im < a < (1 + l)m, 
I, k = 0, - p — 1, then t„ * ff*+i i+ipr-hm i-u • The inverse matrix of 
(t), and the determinants of (t) and (t)~‘ are defined as are v and 
CoBOLLASY 11. Let tKf be true, and let 

hm p-fin — Put, Pyy “ 1* 

n-^do 

Then, if lim dn = 0, it foUowa that 

lim FiYn) = FiY), 

where F{Y) is the distribution function determined by the probability density 

(2ir)"‘^T"* exp r*yh+\ _»« yj+i .-imj 

where, if r < m, a < m, then k = 0,1 ^ 0;if r < m, m < a < 2m, then fc =* 0, 
I «= 1; and ao on. 

The proof is omitted. 

If Zi, ••• ,Zt are random variables, then F{Xi , • • • , X* | Zi , • • • , Z*) is 
the distribution function of the random vectors Xi , • • • , X* for fixed values of 
Zi , '• • • ,Zt, i.e. for any fixed values of Zi , • • • , Z* , 

P{X, < X, , . . . , X* < X*} = F(X, , . . . , X* I Z, , . . . , Z,). 

We shall now assume that the elements Cyn of the matrix Cn are Borel measur- 
able functions of a set of random variables'* Zi , • ■ ■ ,Zt„. Then the matrix 
Cn may be called a random matrix defined on a space Wn which is the combina- 
tory product of the spaces on which Zi , • • • , Z<, are defined. If, for each value 
of n, and for all X” and Z”, the equation 

(2.6) F(X", ZT) = F(Z") . n Fix, 1 ZT) 

is satisfied, then we shall say that (f is true. It is obvious that sufficient condi- 
tions for the truth of if are 

F(X",Z") - F(Z").nW) 

or, if > n 

F(X", Z") * F(Z,+,, . . . Z,.).n Fix,, Z,) 

The eymbol will stand for the set of variables Xi t * . . f Xn f and the symbol 
will stand for the set of variables Zi , . , , , Z$^ . 
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or, if < n > 

F(jr,r)-ftF(Z„Z,). ft F(X). 

Inasmuch as we shall often use Fubini's theorem, it ih now stated here.” 

Thbobem II. Let the distribution function of X”, Z" be F{X', Z”), let Oie 
distribution function of X” for fixed values of Z" he F(X” | Z”), and let the disbribur 
tion function of Z” he F{Z*). Then if A(X", Z*) is measurable toith respect to 
F(X\ Z") and if 


it follows that 


lA(r‘,z-)|dF(r‘,r)< 00, 

f |A(r‘,Z“)ldF(X"|Z“) < 00 

Jlipn 


for almost alfi* sets of values of Z" and 

I A(X“,Z")dF(jr,Z") = f [f A(X",Z")dF(X’*lZ”)ldF(Z"). 

■•Wn L*'*'" J 


In Corollary I an important condition was that the maximum of the absolute 
values of the elements of Cn should approach zero as n increased. In order to 
obtain a similar condition when the elements of are random variables, we 
shall define the function d(Cn) as follows: For each value of Z" let d{Cn) be the 
maximum of the absolute values of the elements of Cn . We shall' denote 
d(Cn) by dn . If the elements of Cn arc Borel measurable functions then dn is a 
Borel measurable function of Z". Hence dn is a random variable defined on Wn • 

A sequence of random variables di , di , • ■ • is said to converge in probability 
to zero if, given * > 0, then 

lim Pjldnl > €} = 0. 

n-^oo 

If the sequence of functions dp , dp^i , • • converges in probability to zero we 
shall say that Z is true. 

If B is true, and if, for almost all values of Z"" we have 

(2.6) / 

(2.7) f XipXi,dF{Xpf ^ (Tii, 

Jrp 


Proofs of Fubini’s theorem with the required amount of generality will be found in 
[6, p. 101] and (14, p. 73]. 

A proposition concerning random variables is said to be true for almost all values of 
the variables, if it is true for all values of the variables, except perhaps for a set of proba* 
bility zero with respect to the distribution function of the random variables. 
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and the condition Up is satisfied with respect to the X and the distribution func- 
tions F(X, , ^”) then we shall say that dc^p is true. 

If 

(2.8) £/ ey,nCtnXi,Xj,dF{X,, Z") == ffijSyt, 

then we shall say that is true. It is noted that if ^ and (2.7) are true, then 
is true if C is true for almost all sets of fixed values of Z”. 

Corollary III. Let C®, 3 and be true. Then, if % is true, it foUowa that 

lira F(K,) = n Niyiy, ••• ,y,y; (o)). 


Proof. It is necessary to show that the condition Spm is satisfied by the 
variables if the condition £p is satisfied by the variables Xip and that the 

condition % implies that lim dn ^ 0 when the Xiypn of Theorem I are set equal 

n-**o 

to the Cy.nX«> of Corollaiy III. 

If we let = 52 (cyrniCfr)*, aI = 52 aI« and let si = E{Ai), then, by (2.8), 
it is true that 

Sh “ 5J “ nit ^Tfi. 

•».< i 

From ‘X p and the fact that for sufficiently large n, | d* (Z") | < 1 for almost all 
Z” we have for any preassigned « and S, 


T f dFiX\ Z^)<jEf m dUZ”) E dF(X, ,Z”)<S 


for sufficiently large n, since the set of and Z** for which ^ip > con- 

i,p 

tains almost all the x*& and for which An > . Hence, the condition 

£pfn is satisfied by the random variables CypnXip with respect to the distribution 
functions F{Xp , Z**). 

We now show that 

lim [max S{(c.^*<,)*}] = 0. 

It is clearly true that 

Elicy,nXiy\ < f dUUF{X„Z”). 


Since d» converges in probability to zero, and since d|^ < 1 for almost all Z, 
we can, for any c > 0, take no so large that if n > no , then P{dl > ^c} < ^o. 
If P is the set on which dfi > i«, we then have for all n > no , using (2.7), 


E {{ CynX {,)*] ^ /.[/.. *;.dF(x,lz")JdF(r) 


aod this inequality ie also aatiafied for all » > no . 



tncniNQ D Hroa i B ot i oiiro . 


133 


The following diecuBsion is useful in obtaining the limiting distributfams of 
statistics which occur in multivariate statistioid analysis. 

The letter / will assume all integral values from 1 through a, the letters n, v 
will assume all integral values from 1 through n/ , and the letters y, h will assume 
all integral values from 1 through m/, for any/. 

Let Xi , • • ■ be, for any fixed /, a sequence of random vectors of p/ compo- 
nents defined on , and let the set of random variables Xf , • • • be independently 
distributed for any fixed /. 

If, for each set of values of ni , • • • , n« , (I. is a function of ni , • • • , n,). 


F(x}, ...,x;.,Zi, ...,zo 


nnf(jrii2.,. 

/ r 




we shall say that S.n is true. 

Let, for any fixed value of /, the matrix” Ci — |1 || where the (/y,n are Bore! 

measurable functions of Xj , (ifc < /), and” Z", have the same properties as 
C„ , and let d{Ci) be the same function of Ci that d(C,) is of Cn . We shall 
denote d(Ci) by di . 

Let 

P 

and let Fl = 1| H- 

For fixed /, the p/ rowed square matrix (v/), its inverse, and so on are defined 
as were the same functions of the <r<,- earlier in this paragraph but with ff<y/ 
replacing o-,-/ , where 

E{Xi,\ = 0 

and 

E{xi,Xj,\ = V<y/. 

If is true, and if for almost all values of Z" we have 


(2.9) 

/ 4.dF(Xi, Z") 

(2.10) 

[ x'i,x'i,dF(Xi,Z^) 

JrP/ 


and the condition £„^ is satisfied with respect to the Xi and the distribution 
functions F(Xi , Z") then we shall say that 3Cp, is true. 


If 


(2.11) Z / Cypn (iwn^ip^}pdF{X{yZ^) = 


The Buperecripta / and k will not indicate multiplication but will only be indicea. 
“ See footnote 11. 
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tliai we shall say that <?^ is true. It. is noted that if 3/ and (2.10) ue true 
then 6^ is true if <? is true for almost all sets of fixed values of X\ 

Z\ 

If di converges in probability to aero as n increases we shfdl say tiiat Z/ is 
true. 

CoBOLLART IV. Let 3, and SCJ,, , • • • , DCp, be true. Then, ifZi, 
are true, itfoUowe that 

lim nFi„...,r..)-nwo, 

where 

F(rO = nAr(y{„..., /,,,;(»/)). 

The proof is almost identical with the proof of Corollary III of which this 
corollaiy is an extension. 

It is remarked that if the statistics, the limiting distributions of which are 
desired, are associated with the normal distribution, as are most statistics 
studied, then Corollary IV may not be the best tool to use. This is a conse- 
quence of the fact that such statistics are generally expressible as functidns of 
uncorrelated random variables and hence are more simply discussed, using 
Corollary I. 

8. Umiting diatributkms of quadratic and bilinear forms. We first assume 
the coefficients of the forms to be constants. For each set of values of i, j, and 
n, the matrix of the bilinear form with coefficients which are real numbers, 

(3.1) bij “ 2^ aptnXipXf,, 

will be denoted by n , and the rank of A n will be denoted by m. The maximum 
of the absolute values of the elements of An will be denoted by 6„ . We shall 
assume that there exists an orthogonal transformation, 

(3.2) t/ipn > 

p 

of Xu, ••• ,Xin such that 

(3.3) 6?y = htyanVfint 

where the coefficients A| are non-negallve.“ 

Lemma I. If dn is the maximum of the absolute values of the elements 0 ^,^ 
(hen a necessary and sufficient condition that lim 6n 0 is lim dn 0. 

** Our theorenu will not be applicable if some of the X« are negative and some are positive. 
However if all the Xi are Don-pontive then the theorems will remain true. 
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Pwx>V: From (8.1) it follows that 

Nl Cl^n Cim . 

I 

Hence, b» > a^n > and | a^n I ^ dl (]^ Xt). The remainder of the proof 
is obvious. 

The following theorem will be the basis for a large sample analogue of Wis- 
hart’s distribution. 

Theobbm III. Let tfCp be true. Then, a sufficient condition that 

lira F(F«) = n . • • • . ; W). 

l »-»«0 y 

whore bij = £ htyunyun is lim 6, = 0 . 

i n-»oe 

Proof. According to Lemma I, the fact that lim bn » 0, implies that 

lim d« = 0. The yon are such that € is true. Hence the hypotheses of CoroL 

lary I are satisfied and the theorem is proved. 

Before staring the corollary to Theorem III, we shall prove an obvious lemma 
which is of constant service. 

Lemma II. Let lim F(X«) =» F{X) at aU points of continuity of F{X), and let 

fflit “ 9l{Xln > • • • , Xpn), • • • , Qkn — Qkiiln t " ‘ ‘ > ®P») 

be Borel measurable functions of their indicated variables for each value of n, 
(P %, defined on R*. 

Then 

lim F{gin , • • • , p*») = F(ffi , ,gk) 

at all points of continuity of F(gi , ••• , ff*), where , • • • , Xp). 

Proof. By (2.1), we have 

(3.4) ^ 

where since gaixi , • * • , x,) is a Borel measurable function of xi , • • • , Xp we 
know that gm, ••• , filtn have a joint distribution function F(gin , • • • , ffun). 
Then, since lim F(X») = F(X) at all points of continuity of F{X) we have” 

n-*oo 

lim •■•'•’I « 

•-♦ao 

uniformly in every h, ”• ,tp interval since 

< / 1 dFniXi, . . . , Xp) - F(.Xi, . . . , Xp) 1, 

See Cramer, [8, p. 80] and "Additional Note*' at the end of the book. 
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where F«(Xi , , Xp) stands for F{Xu , • • • , Xpn), when Xi and Xt» have 

the same numerical values. If follows from (3.4), that 

lim 

uniformly in every h, ••• ,tp interval, and consequently 
lim F(jgtn ,•••,»*«)=' F(gi , 

at all points of continuity of F(gi , • • • , gu)- 
The real valued function Gd(.x; n, c) will be defined by the equations 

Gi(0; 0, c) = 1 , (— 00 < c < oo), 

Gdix; n, e) = [r(Jn)r* (2c)“*"**"“* exp j^— , (0 < a: < oo ; c > 0; n > 0), 

and Gdix‘, n, c) w 0 othawise. The function G{x; n, c) will be defined by the 
equation 

G(x’, n, c) * f Gd(t‘, n,c)dt. 

The real valued function Gd{x\i , Xa , • • • , Xpp n, (<r)) will be defined by the 
equations 

Gd(xn, ■^-,xpp;n;(<r)) » ri(n-t+l)r^l x 

• exp (— i £ tr'^Xiil, (0 < Xii < 00 ; X*,- < XiiXi^) ; (a) is positive definite, 

i,i 

where | x | is the determinant | xa | and Crd(xii , ■ • • , Xp, ; n, (<r)) = 0 otherwise. 
The fimction G(xn , • • • , x,, ; n, (<r)) will be defined by the equation 

/ •pp Mglt 

” ‘ Jl » ” * > ! ”> (*■)) diiidtu • • • dtpp . 

We can now state the limiting distribution analogue of Wishart’s distribution. 
Corollary V. If tfCp is true, if \t = 1, and if m > p then 

lim F{bii, bia, • • • , b*p) = Gibu f • • • > bpp ; m, (a)). 

Proof. The conditions of Theorem III and Lemma II are satisfied. 
Obviously for fixed t, the limiting distribution of is (?(&; m, an), and if 
i ft j, the limiting distribution of &”</*» is the distribution of the covariance of 
X{ and X/ in a sample of m independent pairs of observations.*' 


**See Wishart and Bartlett, (L p. 366]. 
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im 

We proceed to the analogue for limiting dietrU^utbns of one of ourfmenUMa* 
tione of the Fisher^oohran theoran. It is first desirable to give seme addi' 
tional definitions. 

We consider the bilinear forms 

( 3 . 6 ) btfa » £ 

with real coefficients, and we denote the matrix of bii, by .d” . The rank of 
ill is , and the rank of ill is miu . If the maximum of the absolute values 
of the elements of ill , ■ ■ ■ , vllr‘ is 6, , and if there exists an orthogonal trans- 
formation, 

(3.6) Vi^M “ > 

w 

of Zii , ■ “ , Xin such that 

biia — £ Xrya»y/iii t 

where 5 assiunes all integral values from mi -f • • • + m«_i -f 1 through 
mi + • • • ma and Xi is non-negative, then it is easy to prove, as in Lonma I, 
that a necessary and sufficient condition that lim bn »» 0 is lim dn 0, where 

«-*so 

dn is the maximum of the absolute values of the elements c^rn . 

Lemma III. Let m = mi -f • • • + m*-! and let 

(3.7) 23 “ £ XirXf, . 

a w 

Then, a necessary and sufficient condition that 

Van Van ) 

where the real linear functions, yun , of xa , • - • , Xin are given by (3.6), the linear 
functions (3.6) not now being assumed to be orthogonal, is 

mk» ^ n — m. 

Furthermore, Oie functions (3.6) ore orthogonal. 

The proof of this lemma for the case p = 1 is given in [16]. The procedure 
to follow in extending the lemma to the cases where p > 1, is given in [15, p. 
473 ]. It is noted that this lemma is more general than the lemma in [15] 
inasmuch we we show that the orthogonality of the transformation is a conse- 
quence of our hypotheses and not one of the h}rpothe8es.‘' 


** It is noted, however, that the increaae in generality affeoti only the aeoeHity not 
the euffioieney of the theorem. 
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Theorem IV. OC, , (3.7) and (3.8) be true for aU wd/uee of n, and ouppoae 
that liin &• a* 0. Then 

lim F(yn) = 11 fVpyi (»)) . 

»-*oo y 


whsre yun yitn . 

The proof is omitted. 

CoBOLLAEY VI. If the hypotheses of Theorem IV are assumed, and ifmg>p] 
(j9 sa 1, • • • , h; k < k), then 


lim F{bui ) ‘ t tflk+ln , • • • , 2/jmn) 

k m 

* n Oibiiy , . . • , bppy ; my, (a))- n N(.yiy, • • • . »PT ; W)- 

7-1 7-M-I 

If p = 1 in Theorem IV and Corollary VI, we have the large sample analogue 
of the Fisher-Cochran theorem. 

We now discuss limiting distributions of random variables which are bilinear 
and quadratic forms in one set of chance variables for fixed values of other ran- 
dom variables. We consider the reefficients ap,n and a%n of and 6”,'. to be 
random variables. Hence the matrices and An are random matrices. 

To be more explicit, lei X{ , X{ , • •• be a sequence of random vectors, the 
random vector having p/ components Xi„ , and being defined on 

R’’^. The set of random vectors Xi and Zi , • • • , will be assumed to be 
independent. 

For each value of / the coefficients of the bilinear forms 

(3.9) bift,/ * apiafXipllir , (b J ~ 1) ‘ ‘ > Pf i Ot — 1, • • • , fc/) 

will be assumed to be Borel measurable functions of the random vectors 
Xi, ... ,xr‘andZi, ... ,Z,.. 

The matrix of 6?/,/ is denoted by 4^, . The rank of An/ is mf/ and the rank 
of A'Hf is mic/n/ for all sets of values of the a^/«/ except, perhaps, on a set En/ 
which is such that lim PiEn/) = 0. 

Let the function b{A^f/) be defined as follows: 

For each set of values of the Xi and Z let biA%) be the maximum of the abso- 
lute values of the elements of A^/ . We shall denote b{An/) by l/J / . Obviously, 
yHf is a Borel measurable function of Xi and Z. Hence 

t/Hr - b(A%) 

is a random variable defined on W X 



tmmiM namwftmt 


m 


Foi* (EAch value of f, and for almost all seta of fixed values of the ^ , (h >* 
1, • • • ,f — 1), we shall assume that there exists an orthogonal transfonnatbn, 

(8>10) I^im/ “ 

r 

ail, • • • I xfv such that* 

(3.11) bi/a/ “ l^Xk/V/to/ , 

where X assumes all integral values from mt/ + • • • + m«_i / + 1 through 
mi/ + • • • + ma/ . The coefficients c^,*/ of the linear forms (3.10) are real 
single valued Borel measurable functions of the coefficients e^ro/ of the bilinear 
forms (3.9) for fixed values of the X* and Z”. Let ei,n/ be the same function 
of the functions that <4n/ is of the coefficients of the bilinear forms having 
constant coefficients. Furthermore, let be the same function of the matrix 
Ci, = II II where m = nti/ + • • • + , that is of . 

Lemma IV. A neceamry and sufficient condition that bi, converge in probaMUty 
to zero as n increases is that dfa/ converge in probability to zero as n increases. 
Proof. Since 

kf-l 

OaU/ ~ Xtf ^n/Cxm/, 


we have 


and 


*/-> 


(k/ - iK >2:aZ/> K»,l* 


la;/., I < IZ ^ m-zlO*, 

where X assumes all integral values from m,/ + . . . 4- ma-t / + 1 through 
mi/ + • • • + ma/ . The remainder of the proof is obvious. 

In proving Theorem V we shall use a generalization of Lemma III which is 
proved in [15, p. 473]. 

Theorem V. Let fK’J,, . • • tK'],, be true, and suppose that 

n/ 

« ^1 

Therif if bi, converges in probability to zero as n increases and if mf ^ nf m*,*, 
for ail values of n/ , it follows that 

lim , • • • , yp.m.n,) = 11 ft N(y{y , • • • , i/p/t ; (^))* 

The proof is omitted. 


It » not necessary that the ^| be set equal to one as in (3.11). It is only somewhat 
easier to state the results. 
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CoBOLLABT VII. If m«/ ^ P/ , then 

lim F(6nii , • • • , •) = II II 0(biiflf , • • • , ; fw^, (^)). 

The proof is omitted. 

Finally, let us assume that the vectors Xi , for fixed v are imcorrelated and 
for fixed / are independent. By that, we shall mean that E(Xip7^p) « ^uS/g 
and that for all n the set of random vectors Xi are independent for the same or 
different superscripts providing the subscripts arc all different. Let us also 
assume that the coefficients of the forma (3.9) are real numbers. Thus we have 
weakened the hypotheses of Theorem V concerning the random vectors, and we 
have strengthened the hypotheses of Theorem V concerning the forms (3.9). 
Inasmuch as we are generally concerned with the limiting distributions of 
statistics which occur in the analysis of the normal distribution, and many such 
statistics have been shown to be invariant under transformations into uncor- 
related random variables,^^ Theorem VI and Corollary VIII will often be 
applicable. 

i^EOREM VI. The statement of Theorem V is repeated. 

Corollary VIII. The statement of Corollary VII is repeated. 

Another extension of these theorems may be obtained by allowing all the 
n/ to be equal, i.e. ni = • • • = n, = n, and by putting conditions on the forms 
(3.9) which enable us to say that for fixed t,/, y and n, the set of random variables 
are independently distributed. Theorem I could then be used to obtain 
a very general result. However, except for the case dealt with above, the con- 
dition of independence appears to be rather restrictive, and the theorem is 
omitted. 

4. Applications. We first state the strong law of large numbers and a 
lemma which is very useful in the discussion of limiting distributions. 

A sequence of random variables Xi , • • • will be said to converge with prob- 
ability one” to a random variable X if 

limP{|Xn-X| < €, |Xn+l~X| < €, ...,\Xn+p-X \ < 6} = 1 

for every value of p > 0, uniformly in p for every positive number «. Upon 
setting p = 1, it is seen that convergence with probability one implies con- 
vei^ence in probability. 

The strong law of large numbers” asserts that if the independent random 
variables X, Xi, all have the same distribution function, and if E{X) is 

finite, then the sequence of arithmetic means - 2^ X, converges with proba- 

n p 

bility one to E{X). 

The regression transformation which yields the uncorrelated variables will be found 
in fl6, p. 476, (3.2)]. 

** ^e Doob (4, p. 163], and Freohet, [0, p. 228]. 

See Doob (4, p. 163], and Frechet, [9, p. 259]. A complete proof is given by Freohet. 
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Hence, if E{Xi,) » 0 and if vu is finite, then - »i in oohvergm ^ih 

w , 

probability one to vu . Since 23 (*<» — *<•)(*/» ~ *y«) •= ^ XinZf, — nXiJiin 

9 I> , ’ 

where £<» is the arithmetic mean of xa ,>-• ,Xin , and since 2<» converges with 
probability one to sero, it follows that «<;» sr,-* — iiJlin converges with 

probability one to an . It is, of course, assumed that the random variables 
Xi , , Xi, have the same joint distribution function for all values of v, and that 
the random vectors Xi , • • • are independently distributed. The process of the 
reduction of to «<,■« in the limit, is an example of the possible uses of: 

Lemma V. If <p(ti , • ,tp) iea corUinuoua Junction of U, ••• ,ip, and if the 
sequence of random variablee x,-« converges in probability, (with prcbdbility one) to 
X{ which may be a random variable or a constant, then the sequence of random 
variables ^(xin , • •• , Xpn) converges in probability (wUh probability one) to 
(fi(xi , • • • ,Xp), where some or all of the x’s may be constants. If x\, • • • ,Xp are 
constants then ^(h , ■ ■ • , tp) need only be continuous in the neighborhood of 
xi , • • • ,Xp and Borel measurable. 

For a proof of part of this lemma which may be extended to yield the entire 
proof, see, Frechet, (9, p. 178]. 

Using Lemma V it is easy to see that the coefficients rn of least squares 
equations converge with probability one to their /9 values, where the d value 
is obtained by substituting an for 8{,-n in the expression for r, assuming, of 
course, independent random vectors which have the same distribution functions. 

Since problems in the analysis of variance may be interpreted as problems in 
least squares the above comments and Lemma V will generally make it possible, 
when determining limiting distributions, to consider the statistics to be fimc* 
tions of deviations from “true” mean functions rather than “sample” mean 
functions. 

We shall discuss, briefly, four applications of these results. 

(a). The limiting distribution of Oie regression coefficient. Let r„ , the “sample” 
regression coefficient, be defined by the equation 

23 ^f^ir 

rn = 


where Xi, and Xj, are deviations from arithmetic means. If the random vectors 
(Xi, , Xj,) are independently distributed for fixed i, j, with the same distribution 
functions, and if E(xi,) ~ E(xf,) » 0, E{xi,Xi,) » , then it follows from the 

strong law of large niunbers that 23 XifCit/n converges to an with probability 

one, and from the Laplace-Liapounoff theorem that 23 Xi^frly/n has a normal 


limiting distribution with mean an and variance E{xi^i, — ?</)*). Hence, by 
Lemma V, Vn has a normal limiting distribution with mean aero 

and variance lim j?<n(r. — — ) f unless that limit does not etist. 

«-*• t \ ^a/ ) ' 
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If the Xi, ftfe not random variables then, in order to apply Corollary I with 
p ** 1, it is necessary that 

Xim 


(4.1) 




In that case, the limiting distribution of (E is normal with zero mean 

P 

and variance v,-,- . If (4.1) is not satisfied then there is no aasuran^ unless 
the Xf, are normally distributed, that the limiting distribution of (E 

r 

is normal. 

(6). The limiting dietrihution of the analysis of variance ratio. The tests of 
significance which occur in the analysis of variance depend on the ratio of two 
quadratic forms, qin and qu , the denominator qn, having rank (or degrees of 
freedom) mtn increasing with n, and the numerator qu having rank mi not 
changing with n, i.e., 

_ guwifc 

I'll *" I 

qumi 

where + qu + ?i« = E snd q»n is a quadratic form of rank m»„ which 

9 

will be identically zero if n = mi + mtn . Since** qtn is expressible as the 
variance of x about a least squares equation it follows from the previous dis- 
cussion and Lemma IV that — converges with probability one to <t* under the 

assumptions that the x, are independently distributed with zero means and 
variances cr*. Hence the limiting distribution of Vn will depend only on the 
limiting distribution of gin and it will consequently be necessary to consider 
only the matrix of qm , in order to apply Corollary VI with p = 1. For ex- 
ample,** if there are pn independently distributed random variables x* with 
zero means and variances cr* arranged in p blocks of n random variables each, 
then 

~ n 2 (^in “■ ^n)* + 2 An)*, 

i,9 i i,9 


where x<n is the arithmetic mean of x<i , • • • , Xin and Xn is the arithmetic mean 
of all the Xip . Then 

gin ~ n (^n ^n) , 

gin An) I 


nil = p - 1 , 

fiifn « p(n - 1 ) 


^ This has been proved by Kolodsiejcsyk, {12, p. 161). 
** Other schemes are given in Fisher, [8]. 
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and the matrix of gin may be obtained by substituting for the fin and fn . In 
this case it is sufficient to express gu as ^ aaSSi where Si ^ Xi, , an 

(p ■“ I)/P»» and, /, i ^ j, an — — 1/pn, to see that the conditbn that the 
maximum of the absolute values of the elements of the matrix of gin approaches 
zero as n increases. Hence, if the Xi, satisfy the condition £, the limiting 
distribution of miVn is G(i>; p — 1, 1). 

Clearly, if only the rank of gtn increases as n increases, the rank msn of q%, 
being constant and if the maximum of the absolute values of the elements of 
the matrix of gjn also approaches zero as n increases, then Vn will have a limiting 
distribution which is the analysis of variance distribution, and the limiting 

distribution of — — will be the correlation ratio distribution, 
gin + gjn 

(c) . Periodogram analysis. We need only remark that the linear functions 
which are used in the analysis of the Schuster periodogram** meet all the require- 
ments of Corollary I if the x, are independently distributed with zero means and 
constant variances and satisfy the condition £. Consequently the large sample 
theory of the Schuster periodogram is the same for non-normal as it is for 
normal distributions. 

(d) . Multivariate analysis. We shall assume that the random vectors 

Xi , • • • , {X, has components xu , •• • , Xf,), are independently distributed, that 
(2.3) and (2.4) are satisfied, and that the condition is satisfied. For any 
fixed n and a we shall call the determinant DZ of the forms (3.5) a generalized 
sum of squares, and the determinant Fa of the elements a generalized 

variance. We shall say that Df and have rank nif and that Dk and F* 
have rank ntn . If is constant, and if (3.7) and (3.8) are true then clearly 
the limiting distribution of Df is the distribution of the generalized variance 
of nifi vector observations*’ from a normal distribution, with zero means and 
covariance parameters an . Under the same conditions, the limiting distri- 
bution of Dfi/Vk is the distribution of the generalized variance of ntf vector 
observations from a normal distribution with zero means and covariance pa- 
rameters 5n . Many other similar limiting distributions are immediately 
derivable. 

Before completing our discussion of the limiting distributions of statistics 
occurring in multivariate analysis, we shall state a theorem on limiting distri- 
butions which is an obvious generalization of a theorem of Doob, [4, p. 166]. 

Suppose that the random variables g(n)Xi„ ,. • • • , g(n)Xp. have a distribution 
function F(g(n)Xin , • • • , g(n)Xpn) which is such that 

lim F(g(n)Xin , • • • , g(n)Xyn) = F(Z, , . . . , X^), 

where F(Xi , • • • , Xp) is a continuous distribution function, and suppose that 
Xin converges in probability to the real number . For example, if =* 

** The theory of the Schuster periodogram is given by Fisher (7). 

» See Wilks, (18, p. 478] or Msdow, [16, pp. 481, 484]. 
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2 *»/« where E{x,) = 0, E{x\) = 1, and £ is satisfied, then converges to 
¥ 

zero with probability one, and y/nStn has a limiting distribution which is 
normal with zero mean and unit variance, i.e. 

lim |P{Vn«. <x]- N(Xi 1) 1 = 0. 

II -♦BO 

Thkobem VII. Let ip/ih ,ip) be a function of U , ,t, defined in a 

neighborhood N of ii, which, together mth its (k/ + l)-th partial deriva- 

tives is continuous in N. Suppose that k is the least value of r/ such that the 
random variables^ 

[ff(n)l'^[z (*.-n - ’ 

have a joint limiting distribution function D(xi , • • * , x,). Then the random 
variables [flf(n)]*^[^/(xin , • • • , Xpn) — (p/iii , • • • , Jp)] « joint limiting distri- 
bution which is given by D{xi , • • • , x,). The value k/ is greater than or equal to 
the minimum value for which not all the partial derivatives of order k/ vanish at 

(if • * • f (p • 

The proof is almost word for word that of Doob, the only difference being 
the removal of the specializing words. 

We now consider the limiting distribution of the ratio of generalized sums of 
squares Ln which is defined by 



where is the determinant of the forms bifk + 6?,i = 6?,- fc+i . It has been 
shown that*® 


I = n JA 

where F*,- , (j = A:, fc + 1), is a ratio of generalized sums of squares 

(r, s = 1, • • • , i; w, r = 1, • • • , t — 1; bSo/ = 1). 


n = 


IftSwl’ 

Since y?,/myn converges with the probability one to | <rr« |/| |, and since, 

by Corollary VIII the joint limiting distribution of the mi+i n { 1 ~ ) is 

\ y <*4.1/ 


*® See Goursat-Hedrick, [10, p. 107] for a statement of the Taylor expansion of functions 
of several variables, which we use here, by » • • • f ip) 


d^/(xi , ,Xp) 




is meant the value of 


dxi 

*• See Madow, [16, p. 


at the point , ... , . 
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n G{xi ; TO, 1) it follows, by Theorem VII, that the joint limiting distribution of 
the ratios of generalized sums of squares 


i \rn 

IT ^ 

i-i Yl+i’ 


is 

n Gixi ; tTOi , 1) 

i 

and that the limiting distribution of to*+i „{1 — L«) is“ 

G(x;ptoi, 1). 

In a following paper, these results will be extended to quadratic forms in 
nonHjentral random variables. 


5. Summary. In Section 2, Theorem I, we stated a very general form of the 
Laplace-Liapounoff theorem based on the Lindeberg condition. In four corol- 
laries, this theorem was shown to provide joint limiting distributions for sys- 
tems of linear forms which are such that the maximum of the absolute values 
of their coefficients converge to zero with an increase in the size of the sample 
if the coefficients are constants, and converge in probability to zero with an 
increase in the size of the sample if the coefficients are themselves random 
variables. It was shown that under certain conditions functions of several 
random variables, which are such that each function is a linear function of 
certain random variables for fixed values of random variables of lower index, 
also have a normal multivariate limiting distribution. 

These results were extended to include limiting distributions of quadratic 
and bilinear forms in Section 3. The method of extension was to show that 
necessary and sufficient conditions for the existence of systems of linear forms 
satisfying the conditions of Section 2 are provided by rather simple conditions, 
the most important of which is that the greatest of the absolute values of the 
elements of the matrices of the quadratic and bilinear forms approach zero if 
the size of the sample increases, the ranks of the forms remaining unaltered. 
This led to the theorem that quadratic and bilinear forms having such ma- 
trices have x^ or covariance, or Wishart's distribution as limituig distributions. 
It was then shown, in Theorem IV, that if the rank of the sum of the matrices 
of the quadratic and bilinear forms is equal to the sum of the ranks of the ma- 
trices, and if certain of these ranks do not change as the size of the sample 
increases, then the system of quadratic and bilinear forms have Wishart’s 
distribution in the limit provided the other conditions are met. These results 


** A generalization of Wilkn’ result, [19, p. 323j to the case where the variates are not 
assumed to have a normal multivariate distribution may readily be obtained. 



146 


WILUAM a. MADOW 


were then extended in Theorem V to one of the cases occurring when the coeffi- 
cients of the forms are themselves random variables. 

Several simple illustrations of the uses of the methods were given in Section 4. 
It was shown that the analysis of the variance ratios, and statistics occurring 
in the theory of multivariate statistical analysis have the same limiting distri- 
butions which they would have had if their variables had been normally and 
independently distributed. 
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ON A TEST WHETHER TWO SAMPLES ARE FROM THE SAME 

POPULATION' 

By a. Wald* and J. Wolfowitz 

!• The Problem.® Let X and Y be two independent stochastic variables 
about whose cumulative distribution functions nothing is known except that 
they are continuous. Let xi , X2 , • • • , be a set of m independent observa- 
tions on X and let 1/1 , • • , i/n be a set of n independent observations on Y, It 
is desired to test the hypothesis (the null hypothesis) that the distribution 
functions of X and Y are identical. 

An important step in statistical theory was made when “Student^^ proposed 
his ratio of mean to standard deviation for a similar purpose. In the problem 
treated by “Student^’ the distribution functions were assumed to be of known 
(normal) form and completely specified by two parameters. It b clear that in 
the problem to be considered here the distributions cannot be specified by any 
finite number of parameters. 

It might nevertheless be argued that by virtue of the limit theorems of 
probability theory, ^^Studetii\s^^ ratio might be used in our problem for large 
samples. Such a procedure is open to very serious objections. The popula- 
tion distributions may be of such form (e.g., Cauchy distribution) that the limit 
theorems do not apply. Fiu^thermore, the distributions of X and Y may be 
radically different and yet have the same first two moments; clearly *‘Studentb*' 
ratio will not distinguish between two such distributions. 

The Pearson contingency coefficient is a useful test specifically designed for 
the problem we are discussing here, but one which also possesses some disad- 
vantages. The location of the class intervals is to a considerable extent arbi- 
trary. In order to use the distribution, the numbers in each class interval 
must not be small; often this can be done only by having large class intervals, 
thus entailing a loss of information. 

2. Preliminary remarks. Denote by P{X < a:} the probability of the rela- 
tion in braces. Let f(x) and g{x) be the distribution functions of X and Y 
respectively; e.g., P{X < a:) = f{x), Thro\ighout this paper we sliall assume 
that /(a?) and g{x) are continuous. 

Let the set of m + n elements , • • • , a^m and 2/1 , • • • , 2/n be arranged in 

* Presented to the Institute of Mathematical Statistics at Philadelphia, December 27, 
1930. 

* Research under a grant-in-aid from the Carnegie Corporation of New York. 

* The authors are indebted to Prof. S. S. Wilks for proposing this problem to them. 
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ascending order of magnitude, and let the sequence be designated by Z, thus: 
Z = Zi , Z 2 , • • • , Zm+n , whcrc Zi < «2 < * * * < ^m+n . (/(^c) and g(x) were 
assumed to be continuous. Hence the probability is 0 that 2 » = and there- 
fore we may exclude this case.) Let V = , i; 2 , • • • , Vm+n be a sequence de- 

fined as follows: Vi = 0 if z, is a member of the set Xi , • • • and t;< = 1 if Zi 
is a member of the set 2/i > * * » . It is easy to show that any statistic S 

used to test the null hypothesis should be invariant under any continuous, 
reciprocally one-to-one transformation of the real axis. That is to say, if 
// = fp{t) is any such transformation, then 

(1) S(xi , ■■■ ■■■ ,yn) ^ S(<f>(xt), ■■■ , <f>(x„), v>(yi), • • • , v>(Vn)). 

The reason for this requirement on S is the fact that the transformed stochastic 
variables X' = ^(X) and V' = ifl(Y) are continuous and have identical distribu- 
tions if and only if X and Y have identical distributions. Hence S must be 
a function of V only, with the added restriction that S{V) = iS(F'), where 
V' = v»+n , fm+n-i , • • • , . For if /S were a function of xi , ■ • ■ , x„ , 

I • • • » J/n which cannot be expresscul as a function of V alone, then there 
exists a continuous reciprocally one-to-one transformation i' = such that 
(1) is not true. On the other hand, any continuous reciprocally one-to-one 
transformation of the entire line into itself is monotonic and hence either leaves V 
invariant or else transforms it into V. 

3. Previous results. In an interesting paper on this problem W. R. Thompson 

[1] proceeds as follows: Let the sets xi, ••• ,Xm and yi, • •• , jt» be ordered in 
ascending order of magnitude, thus: Xp^ , Xp^ , ■ ■ ■ , and Vpi , ypi , • ■ • , ypk 
where Xp^ < Xp^ < ■ ■ ■ < x,^ and ypi < ypi < • ■ ■ < ypi • Let P{xp^ < j 
denote the probability of the relation in braces under the null hypothesis (/(x) s 
g(x)). This probability is shown to be independent of f(x) and the relation 

(2) = ^(m, n, k, k') 

holds, where the right member, which is given explicitly by Thompson, is a 
function only of the arguments exhibited. To make a test of the null h}rpothesi8 
with, say, a 5% level of significance, this writer proposes to choose k and k' 
so that ip(m, n, k, k') — .06. The test would then consist of noticing whether 
®j>* < yp'k' or oot- tbo former case the null hypothesis is to be considered 
as disproved. 

It is clear that this test cannot be very efficient, ignoring as it does so many 
of the relations among the observations. Except under certain rather narrow 
restrictions on the admissible alternatives, for example, that g(x) s f{x + c), 
where c is an arbitrary constant, the test suffers the further defect of not being 
“consistent” in a way which will be discussed below. Hence the test suggested 
by Thompson can scarcely be regarded as a satisfactory solution of the problem. 
This criticism, of course, does not apply to those sections of Thompson’s paper 
which deal with the question of estimating the so-called normal range. 



FSOBL.BM OF TWO SAMPLES 


4. The statistic U. A subsequence Vt+i , v,+a , , v,+r of V (where r may 

also be 1) will be called a “run” if = v,+t == ■ • • b v,+r and ii v, 9 ^ Vt+i 
when « > 0 and if Vs+r 9 ^ »*+r+i when a + r < tn + n. For example, V «» 
1, 0, 0, 1, 1, 0 contains the following runs: 1; 0, 0; 1, 1; 0. The statistic* U 
defined as the number of runs in V seems a suitable statistic for testing the 
hypothesis that f(x) a g(x). In the event that the latter identity holds, the 
distribution of U is independent of f(x). A difference between /(*) and g{x) 
tends to decrease U. U is consistent in a sense which will be discussed below. 
In order to derive the distribution of U under the null hsqjothesis, we first 

note that all the * (= possible sequences V have the same 

mini 

probability consider the sequence V where Vi = 0 

(i = 1, 2, • • • , m) and v,- = 1 (i = m + 1, m + 2, •••, m + n). Clearly the 
probability of the sequence is 

m(m — 1) • • • l-n(n — 1) • • . 1 
(m + n)(m + n — 1) • • . (n + l)n(n — 1) • . . 1* 

Furthermore, the probability of any other sequence is equal to the product of 
the factors in the numerator of g taken in a different order, divided by the 
product of the factors in the denominator taken in the same order. The quo- 
tient is, of course, = q- 

I^et €o be the number of runs in V whose elements are 0 and let ex be the 
number of runs whose elements are 1 . Obviously U = €o + . Let the runs 

of each kind be arranged in the ascending order of the indices of the Vi . Let ro,- 
be the number of elements 0 in the run of that kind (j = 1, 2, • • • , Co) and 
let Ti j* be the number of elements 1 in the run of that kind (/ = 1, 2, • • • , 6i). 
The following relations obviously hold: 


(3) 

2^ ro, = m, 





(4) 

2- ru' = n, 


,'-i 

(5) 

1 < Co < »*, 1 < 

(6) 

1 Co - «1 1 <•!. 


^ When this paper was already in proof, our attention was called to a paper by W. L. 
Stevens, entitled “Distribution of groups in a sequence of alternatives, “ Annala of Eu* 
genicSf Vol. 9 (1939). There a statistic, which is essentially the U statistic, is proposed 
for a problem different from that considered by us and the distribution of U is obtained 
in a different manner. However, the application of the U statistic for the purpose herein 
described, the proof of consistency and the other results of our paper are not contained 
in it. 
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Hence if f/ = 2k, then co = ci = k, and if f/ » 2A: — 1, then dtber e* =« i, 
Cl = — 1 or Co = fc — 1, ei = fc. The element vi of V together with the num- 

bers roi , fo* , • • • , ro,, , rn , ri* , • • • , n,, , completely determines the sequence V 
whose probability is q. 

Without loss of generality we may assume that m < n. It U = 2k, 
I < k < m, vi — 0, any two sequences of k positive numbers each may consti- 
tute a sequence of roi ,•••, ro,o , ru , ri,, provided only that (3) and (4) 
are satisfied. The number of sequences roi , rm, • • • , ro* which satisfy (3) is 
the coefficient of o" in the purely formal expansion of 

(o + a* + o* + •..)* = — o ) 

and hence is . Similarly the number of sequences ra , n* , • • • , ri* 

which satisfy (4) is found to be ’'~^Ck-i . Bearing in mind the case U — 2k, 
»i = 1, we obtain 

(7) />{ f/ = 2&1 = , (fc = 1, 2. . . . , m). 

where the left member denotes the probability of the relation in braces under 
the null hypothesis. In a similar manner we obtain 

(8) p = {f/ = 2ifc - ij = ^ 


(k = 2, - • ’ , m + 1), 


with the proviso that “Cd = 0 if o < 6. 

We shall now briefly indicate a method of obtaining the mean E{U) and 
variance a^{U) of U. For example, E{U) may be obtained by performing 
several summations of the type 

(9) 

<»0 

It is easy to verify that the expression (9) is the tenn free of a in the purely 
formal expansion in a of: 

( 10 ) (m - 1).(1 + ^ \ 


and hence is 

( 11 ) 


(m- l)."-""-*C._,. 
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The other summations required for the mean and variance can be carried out 
in a similar manner. W e shall omit these tedious calculations. The results are : 


( 12 ) 


E(U) * 


2toi^ , 
m + n 


1 , 


(13) 


AU) = 


2mn{2m n — m — n) 
{m + n)*(m + « — 1) ' 


The critical region for testing the null hypothesis on a level of significance 0 
is given by the inequality U < uo , where «o is a function of m and n such that 
P{t7 < ««} = /8. 


6. The asymptotic distribution of U. Let m/n = a, a positive constant. 
Then, as m — > <» , 


E(U) 

Au) 


2m 

r+^’ 

4oem 


(! + «)»■ 


Theorem I. 
2m 


U < 


1 +a 


+ 2 


If I is any real number, (he. probability of the relation 
t cxmverges uniformly in t to 


cm 





e-*“'dw 


a« m 00. 

The proof of this theorem is essentially the same as the classical proof that 
the binomial law converges to the normal distribution (see, for example, Fr6chet 
[2], p. 89) and it will be unnecessary to give the details. Since the asymptotic 
distribution of the subpopulation of even U is the same as that of odd f/, it 
will be suflScient to consider only the right member of (7). Let m' = m — 1, 
n' = n — 1, and fc' = fc — 1. We make the substitution 


1 + a' I. / 

(14) w = ^ , where 

rn' 

(16) dw = — 

V ^ 


and evaluate the factorials by Stirling's formula. We shall give here only the 
results of successive simplifications. At each step we shall omit the factors 
free of k or since their product may be reconstructed from the final expo- 
nential form. Thus instead of the right member of (7) we can consider the 
expression: 

( 16 ) 
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Omitting factors free of k, we get 

(k- l)l(m - k)rik~iyr(n^^\ 
and by Stirling’s formula, since k and m are both large: 

(18) k^'^'^im' - ’ 

Now apply (14). We obtain 

/ / — . in' ( r~t _i_ m'ot' a 

^ Vw»' w + j ” “ r^FaV 




/—T * 


Dividing inside the parentheses by • ,y ”-. — ^r, respectively, 

li-a l + a a(l+a) 

and again omitting factors free of w, we get 

(i . (L±j*>Vvsr^»-i^.-« Yi _ (i.± 

\ / \ a'y/m' ) 

\ vW / 

W . 

Taking logarithms, expanding in powers of s^nd neglecting terms in 
and higher orders, the results are 

-(2v®» + - <‘-+#’0 

/ / — ^ rn/ \\(a!{\ + (x!)w , a*^{\ + a!)^ v?\ 

- iK^ys- + "" w — ) 


which equals 


+ 0(m'-‘). 


The proof of the fact that the distribution of w converges uniformly to the 

normal distribution with zero mean and variance tt-t — ? r. can be carried out 

(1 + a')* 

in the same way as the classical proof that the binomial law converges to the 
normal distribution. 
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It is obvious that 


VO* 


k — 


m 

1 + a 


y/m 


has the same distribution as vo. From this and from the fact that U ^ 2k or 
2k — 1 Theorem I follows. 

In using conventional tables of the Gaussian function to make tests of sig- 
nificance on U when m and n are large, the reader is urged not to forget that the 
critical region of U lies in only one tail of the curve. 


6. An example. We give here a simple example illustrating the use of the 
statistic U and Theorem 1. 

Suppose 50 observations were made on X and 60 observations on Y. Suppose 
further that these observations are arranged in ascending order and that the t**' 
element of this sequence is said to have the rank i. The observations on X 
occupy the following ranks: 1, 5, 6, 7, 12, 13, 14, 15, 16, 17, 16, 20, 21, 25, 26, 
27, 28, 31, 32, 38, 42, 43, 44, 45, 60, 51, 52, 53, 64, 66, 57, 68, 62, 63, 64, 66, 
68, 69, 76, 79, 80, 81, 86, 87, 89, 90, 91, 93, 94, 95. 

The observations on Y occupy the remaining ranks. 

In this case, C/ = 34. 

For m s= n = 60, 

E{IJ) = 51, 

<r\U) = 24.747. 

The probability of getting 34 runs or less when the distribution functions of X 
and Y are continuous and identical is therefore less than 5' 10“^. 


7. Consistency. We shall say that a test is “consistent” if the probability 
of rejecting the null hypothesis when it is false (i.e., the complement of the 
probability of a type II error, cf. Neyman and Pearson, [3]) approaches one 
as the sample number approaches infinity. In the literature of statistics a 
function of the observations which converges stochastically to a population 
parameter as the sample number approaches infinity, is called a “consistent” 
statistic. If a test of a hypothesis about a population parameter is made by a 
proper use of a consistent (statistic) estimate of the parameter, the test will 
be consistent also according to our definition, which thus furnishes an extension 
of the idea of consistency to the case where the alternatives to the null hypothe- 
sis cannot be specified by a finite number of parameters. 

It is obvious that consistency ought to be a minimal requirement of any good 
test. It is the purpose of this section to prove that, subject to some slight and 
from the practical statistical point of view, unimportant, restrictions on the 
distribution functions, the test furnished by the statistic U is consistent. 

We shall say that the distribution functions /(x) and g{x) satisfy the condi- 
tion A, if, for any arbitrarily small positive S, there exist a finite number of 
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closed intervals, such that the probability of the sum / of these intervals 
is > 1 — according to at least one of the distribution functions /(x) and g(x), 
and such that fix) and gix) have positive continuous derivatives fix) and 
g'ix) in /. 

In all that follows, although m and n are considered as variables, their ratio 
m/n is to be a constant, denoted by a. Let p > 0 denote the level of signifi- 
cance on which the test is to be made, so that, if fix) sb gix), 

(23) P{[/ <«o(»n)} =/3 

where the critical region for two samples of size m and n, respectively, is given by 

U < Vaim). 

Thbobem II. If fix) and gix) satiny condition A, and if 

(24) fix) ^ ^(x), 
then 

(25) limPIt/ < Mo(»»)| = 1. 

in** 00 

The proof of this theorem will be given in several stages. 

;/; and 0^ denote the mean and variance, respectively, 

of — , when X and Y have the distribution functions /(x) and flr(x), respectively, 
m 

and the sample numbers are m and n. Let the set Xi • • • Xm ; j/i • • • yn be 
arranged in ascending order of magnitude, thus: 

(26) Z = Zi ,*,,••• , Z»+n , 
where Zi < *i < • • • < Zm+n . The sequence 

(27) V = vi,vt, , i;»+„ 

is defined as follows: = 0 if is a member of the set Xi • • • x„ and v.- — 1 
if z< is a member of the set yi • • • y» . 

Lemma 1 . If the following are fulfilled : 


a) 

fix) * 0 

X < 0, 


fix) m X 

0 < X < 1, 


fix) m 1 

X > 1. 

b) 

gix) as 0 

X < 0, 


gix) m 1 

X > 1. 


c) The derivative g'ix) of gix) exiats, is conlinuoua and positive everywhere in 
the interval 0 < x < 1. 


Let e(- 
\m 
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d) k is an arbitrary but fixed posiHve integer. For every tn, itm < itm < 
• • • < ikm are a set of k positive integers siuhjeet only to the restrudion OuA (Ar 

least upper bound y of the sequence — is less than 1. 

m + n 


Then the expected value 





satufies the inequality 


(28) 


E 








«+p'(ox/ ,) 


< <p{m) 


where X,™ = — and ox,_ {] = I - k) is the root of 
m + n 

(29) max*. + ng(flx,J * X;«(m + n) 
and y>(m) depends only on m and is such that 

(30) Lira ip{tn) = 0. 


It is easy to verify that the root Ox^, of (29) exists and is unique. 

Proof; It will be sufficient to show that, for any specified set of values of 

> *'<(r+I)i» • • ■ •'<*•» (*■ “ 1 • • • A:) 

the conditional probability = 1) of the relation in braces satisfies the 

inequality 

( 31 ) , - PW., = 1 } < 

where i>(m) depends only on m and is such that 

(32) Lim = 0 . 

For each m let 

(33) Vm — ) Vim * ■ ■ *^<(r-l)» » *^<(r+n» ’ ' ’ 

be a fixed sequence whose elements are either 0 or 1. We shall consider the 
conditional probability «■«},(«** 0, 1) of the relation in braces subject 
to the condition that 

(34) Vi,^ » Vi',, , (i = 1, 2, . • . (r - 1), (r + 1), (r + 2), • • • *). 

Let a and b be two numbcffs such that 0 < a < 6 < 1, and let m* be a non* 
negative integer such that m* < m, and m* < [>(«» -H n)] where (^(w + n)] 
doxotes the largest integer ^ y{m -f n). Let Qm(o, b, m*) denote the proba- 
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bility that, if m* observations are made on X and [y(m + n)] — m* observations 
are made on 7, the following conditions will be fulfilled: 

(a) the total number of observations < o is exactly i,., — 1 

(b) all observations are < b 

(c) if the [y(m + n)] observations are arranged in ascending order and if 
V* = 0 or 1 according as the element is an observation on X or on Y, then 

(35) C = = 1,2, 1), 

and 

(36) (i = r + 1, r + 2 . . . *). 


It is easy to see that the probability Pa of the simultaneous fulfillment of the 
relations (34) and of = 0 is given by 

(37) Po = f «« (a, b, m*W(l - 5)’"'-‘(l - g{b)r' da db , i 

•'O Jq m* 

where 

(38) i4 (a, 6, m*) = ^ (a, b, m*), 

(39) m' = m — m*, 
and 

(40) n' = n — ]y(m + n)] + m*. 


Similarly, the probability Pi of the simultaneous fulfillment of the relations 
(34) and of = 1 is given by 

(41) Pi = f 7' £ ««(o, b, m*) n'g'ia)il - 5)"'(1 - p(5))"'-' da db . 

Jo Jo m» 

Then 


PjVir^ = 0) ^ n 

PK,. = 1) Pi’ 


Let no = 2 *>i aod m — m + n — [ 7 (m + n)] — no . The variables 

(2,v« - ox,„), («iT(«+n)j - ay), g(a^)) ) ®o"'^®’'ge stochastically to 


zero. 

Let Po(<) and Pi(c) denote the values of the right members of (37) and (41), 
respectively, if the integration is restricted to the region where a ^ b, 
I a -> ax,„ I < c, I b — Oy I < < and the summation is restricted to those values 
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of m* for which ^ ^ . Hence, becauee of the aforementioned 

n' (1 - jf(o»)) 

stochastic convergence, for all sufficiently large tn 


(43) 


I Ptit) - P« I < « « = 1, 2. 


Since P, > 0, for sufficiently large m, also 


(44) 


Po(«) ft . 

Pi(*) ft 


Since g{x) and g'{x) are continuous in the interval [0, 1] and hence uniformly 
continuous, it is clear that 


(45) 


|PoW 

ft(0 g’XaxJ 


where c is a fixed constant independent of m. From (44) and (45) it follows 
easily uhat, for any arbitrarily small c'. 


(46) 

for sufficiently large m. 


ft 

ft 


a 


<*' 


Since P{t»j,. = 1) = rr^n » required relation (31) follows. This com- 
+ -r 1 

pletes the proof of Lemma 1. 

Lemma 2. If conditions a, b, and c of Lemma 1 are aatuffied, then 


(47) 

Lim ^ = 2 / 

m-*« \tW / 

g'ix) 

a + 

and 

(48) 

Lira 

fn-»se \m / 

= 0. 


Pboop: Since 


(49) 


mm 


m 


1 + + Vm+n 2 

= r — ^ — 

m m f ^2 


2 


m4*n 


m 




we have from Lemma 1, 
(60) 


•© 

_ 2 fy' g'Xoim) __ 

T'/ gW Y\ 

m L y a + ?'(%«) 

r\« + g'(ay«)/ J 


2 y> r ag'ioim) 1 

+ n(»») + n*iy) > 


m^l{a + fif'(ajt,))*J 
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where 

(61) Lira ti(m) = Lim if*(y) = 0 

m-»oo 7*^1 

and a/m is the root of the equation 

(52) tna/m + ng{ajm) = j ( j = 2 • • • m + «)• 

From equation (52) it follows that 

(63) Lim {a/m - a(,_i)„)(TO + ng'{a/m)) = 1 

m-*oo 

uniformly in j. Since 7 may be chosen arbitrarily near to 1 , the required 
result (47) follows easily from (50). 

It remains to consider the variance of — . The expression 

m 

m m ;-2 

2 1 .. 

differs from - by at most - , so that its variance converges to zero with w qo . 
a m 

In order to prove (48), it will be sufficient to show that the variance of 

1 m+n 

(54) ^ i 

m ,-8 


goes to zero with increasing m. From Lemma 1 it follows that 


(55) -z{m) < [E{viv/vi/v,) - EiviV/)Eiv,^.)] < z{m), 


where Lim j z{m) | = 0 , provided only that the integers i, j, k, I are distinct 

m-»oo 

and < 7 (m + n). The variance of mW is the sum of terms of the type occurring 
in (55). The number of terms for which i, j, k, I are distinct is of the order m*. 
All other terms are of size at most 2 and their number is of the order m. Since 
the number y may be chosen arbitrarily near to 1 , the variance of W converges 
to zero with to — » oe . 

This proves Lemma 2. 

Lemma 3. If conditions a, b, and c of Lemma 1 are fulfilled, and if (24) holds, 
then 


<M) 


-r 

•'o a 


g'(x) 


0 a + g'(x) 


dx < 


1 

! + «• 


Let Oi < Ot be any two real numbers and designate 


Oi + a* 
2 


by oj. 


F{x) be defined as follows: 


Let 


(67) 


F(ax) - 0 , 

F{x) - (x - aifbt + F(o<), 


(c< ^ X £ a/+i ; »■ - 1, 2). 
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Let e be defined by 


(68) 

F(<h) = e(at - oi). 

Then it is easy to verify that the maximum of 

(69) 

T* = r* dx 

a + F'(x)‘“ 

with respect to bi and bi , subject to the restrictions that bi and bt be non- 
negative, and that Ui , oj and c be fixed (c > 0), occurs when and only when 

(60) 

6i = 6* = c. 

Now define 


(61) 

= Poi=0, 

I _ giPii) ~ giP (•-i)i) 

2> 

and 

(i-l,2,...y;j = 0,l,2...). 

Repeated application of the result of the previous paragraph easily gives 

(62) 


From (24) it follows that there exists a positive integer J' such that S/* > . 

Obviously 

(63) 

-So = 1-4- 

1 + a 

and 


(64) 

Lim Sf = r. 

1- 

Hence Lemma 3 is proved. 



Proof of Theorem II : Let fii > 52 > • • • > 6,- > • • • be an arbitrary but fixed 
sequence such that lim 5, = 0. For 5 = 6,- , let /i , • • • , /*(,•> be a set of closed 
intervals such that no two intervals have an interior point in common and 
within which, by condition (A), /'(x) and g'{x) exist, are positive, and con- 
tinuous. Let hi be the complementary set (with respect to the whole line). 
(It is easy to see that, if condition (A) is fulfilled, such a system can be con- 
structed.) Let Ui{i — 1, 2 • • • k{j) and f/o/ denote, respectively, the runs 
caused by the observations which fall in the intervals It, hi- Then 

U - T, Ui - f/o/| < 


( 66 ) 
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From condition (A) it follows that, with a probability arbitrarily close to 1, for 
sufficiently large m, 

(66) t/oi < 3pwii; , 

where p = max ^1, (j = 1, 2 • • •)• 

Let [a< < a: < 6<], t = 1, 2 • • • denote the interval /< , and let »»< and n< denote 
the number of observations on X and Y, respectively, which fall in the interval 

li . Then — and — converge stochastically with increasing m to lf{bi) — /(a,*)] 
tn n 

and [g(bi) - g(a,)l, respectively. 

Within the interval Ii(i = 1, 2 • • ‘ fc) we perform the transformation 

(67) X* - /(X), r* - /(F), 

which leaves Ui invariant. For fixed m,- , n* the relative distribution of X* 
is uniform and the relative distribution of Y* fulfills condition (c) of Lemma 1. 

Hence from Lemma 2 we obtain that — converges stochastically to 


Lim E 




2[/(6.) - /(o,)][g(6,) - p(a<)] 


It can be verified that the sum of the second members in (68) over all values t 

2 

is less than or equal to r— ; — . 

1 + a 

From (24) and condition (A) we get that, for sufficiently small hj , there exists 
at least one interval for which the first member of (68) is less than the second 
member. Hence 


(69) 

where 


S < 


2 

1 + a’ 


(70) S = i:LimF(^';/;ff). 

<■■1 m-*«) \Vfl / 

Now take j so large that 


(71) 


3p«, < t, 


where 


(72) 


0 < 3« < S. 

1 + a 
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Since — convei^es stochastically to its expected value, from (66), (66), (70), 
m 

(71), and (72), it follows that, with a probability arbitrarily close to 1, for suffi- 
ciently large m. 


(73) ^ < 

m 

From (23) and Theorem I we get 


2 

1 + a 


<. 


(74) 


lim 

in'"** 


uo(m) 


2 


m 1 -1- a’ 
Theorem II follows easily from (73) and (74). 


8. Remarks on a proposed test. We have already remarked in Section 3 that 
the test proposed by W. R. Thompson is not consistent. To show this, we shall 
give two distribution functions f{x) and g{z) such that, although these functions 
will be very different, the probability of rejecting the hypothesis that they are 
the same will not approach one as the sample number approaches infinity. 

Suppose, to simplify the notation, that the observations have been ordered 
according to size, i.e., that Xi <xt < ••• < Xm and yi < yt < • • • < Pn . Sup- 
pose further than m s n, and that the test is to be made on a level of significance 
> 0. In the right member of (2) we need not exhibit n and shall replace 
k and k' by fc(w) and k'{m) to show the dependence on m. We have, under the 
null hypothesis, 

(75) P{®*(») < »*'(»)} = ^'{m, Mm), k'(m)) » fi. 


IcCtti) 

The sequence is bounded, so that there exists a monotonically increasing 

m 

subsequence mi , • • • of the sequence of integers 1, 2 • • • and a number h, 

0 < h < 1, such that 

(76) Lim = h. 

nii 

It iR easy to see that then also 


(77) 


<-«. riii 


h. 


We shall now assume that 0 < A < 1. If fc = 0 or 1 only a trivial alteration 
will be needed in the argument to follow. Let e and 5 be arbitrarily small posi- 
tive numbers. We now consider two populations, A and B described as follows: 

A) fix) » g(x) m X (0 ^ 1), 

B) fix) mx (0 ^ ^ 1), 


gix) 


g{ai) + (» ~ adigia^-i) - g(g<)) 

(a*+i ”• Of) 


(a< ^ ^ ai+i;t « 0, 1, • . 4), 
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where 


0 

II 

0 

ii 

ai — h — 2S > 0 

0 

ii 

Qi — h — 5 

g{at) = oj 

®s — — h 

II 

04 = 1 - a 

II 

Oi = 1 

ff(as) = 1 


The definition of f(x) and g{x) outside the interval 0 < a: < 1 is obvious. It 
will be shown that even for such different populations as A and B and for 
samples of size greater than that of any arbitrarily assigned number, the prob- 
ability of rejecting the null hypothesis if B is true will be at most /3 + «. 

Let hi , hi , hi denote the number of observations on X which fall in the 
intervals 0 <T<O 2 ,a 2 <x<as,a 3 <x<l, respectively {m fixed, of course). 
Let hi , hi, hi be the corresponding numbers for T. For a fixed m, the prob- 
ability of a set hi , hi , hi, h[ , hi , hi is the same whether the sample be drawn 
from the population A or B. From (76), (77), and multinomial law it follows 
that for all suflBciently large r/i,- the probability is at least 1 — « of the occurrence 
of a set hi,hi,hi,h'i,hi, hi for which x*(«,) and yk'imo will both fall in the in- 
terval Ot < X < tti. Furthermore it is obvious that for all samples with fixed 
hi , hi the distribution within the interval oj < x < 03 is the same whether the 
sample came from the population A or B. Hence even when the sample is 
drawn from the population B, the first member of (75) is < jS + e. This com- 
pletes the proof of the inconsistency of the test based on (75). 

This test is consistent if the alternatives to the null hypothesis are limited, 
for example, to those where g{x) s f(x -f- c), c a constant. 
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THE SUBSTITUTIVE MEAN AND CERTAIN SUBCLASSES OF THIS 

GENERAL MEAN 

By Edward L. Dodd 

1. Introduction. No general agreement has been reached, so far as I know, 
as to what constitutes a mean. A necessary condition which appears to meet 
with general approval is that a single-valued mean of a set of numbers all equal 
to a constant c should itself be equal to c. However, there appears to be some 
valid objection against imposing any other proposed condition as necessary. 

Of course, intermediacy is a condition that suggests itself at once- Indeed, 
in certain mean value theorems in general analysis — such as the First Theorem 
of the Mean for integral calculus, which I mention in Section 3 — intermediacy 
is the main feature. 

However, O. Chisini [ 1 ] insisted that intermediacy or internality is not the 
chief characteristic of a statistical mean. Rather, a mean is a number to take 
the place, by substitution, of each of a set of numbers in general different. 
Such a mean may well be called a representative or substitutive mean. 

Chisini defined w to be a mean of , a^ 2 , • • • t ) relative to a function F, 
provided that 

(1.1) F(m, m, . . . , m) = F{xi , 0 : 2 , • • • , 

If, for example, 

( 1 . 2 ) F(xi , X 2 , • • • , x„) = l^x\ = = nrn\ 

the mean m thus (obtained is the root-mean-squaro 

(1.3) m = =b [{X/n)Xx\t‘\ 

The choice of F, Chisini noted, depended upon the use to be made of the 
mean. 

Suppose now that f{xi , X 2 , • • • , Xn) is such a function that one value of 

(1.4) fix, X, . . . , x) = x. 

And suppose that this /is taken as a particular F for (1.1) to determine a mean 
m implicitly) thus 

(1.5) /(m, m, . . . , m) = /(xi , X 2 , • • • , Xn). 

Then, from (1.5) and (1.4) it follows that one value of 

( 1 . 6 ) fi^i > ^2 > • • * 9 ^n) ~ m. 

And thus / determines the mean m both explicitly and implicitly. 

168 
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It should be noted that the F »= Sx* in (1.2) is not itself a mean of the x< . 

If, in (1J2), we take Xi ** —2, Xj ~ 1, x» = 1, then the double-valued mean 
m = ± 2‘^* results. Now — 2‘^* is internal; e.i. —2 < —2’^’ < 1; but 2*^* is 
external, for 2*^* > 1 > — 2. But since here Sx< = 0, it follows also that the 
standard deviation of —2, 1, 1, is the external mean 2''^ Chisini [1], indeed, 
used the root mean square to show the possibility of external means. External 
means have been noted by other writers, [2-7]. 

It is noteworthy that a number of writers [6-12] have used the condition 
(1.4) (in general, with / single-valued) as one of a set of axioms to 
characterize particular means. Sometimes, this has appeared in weaker form 
a8/(l, 1, ••• , 1) = 1. 

This paper will be concerned primarily with the mean of a finite number n, 
of variates, Xi , x* , • • • , x„ . Possible generalizations will be mentioned briefly 
in Section 8. 

In the conception of the substitutive mean, m, as I have been using it for some 
time, emphasis is laid upon the explicit form for m; and provision is made for 
mtdtiple values. 

Definition of the Subbtitutive Mean. Let f(xi , x* , • • • , x„) 6e o func- 
tion of n variables, xi , x* , • • • ,Xn defined at least for one set of equal values, x,- = k. 
If c is any number stusk that f{c, c, • • ■ , c) is defined, let one value of 

(1.7) fic, c, . . . , c) = c. 

Then f(xi , Xj , • • • , x„) will he said lobe a substitutive mean of Xi , x* , • • • , x„ . 

If an original formulation of a problem does not assign to a function a value 
when the variables are all equal, it is sometimes possible to assign such values 
by continuity considerations, such as are commonly used in the “evaluation” 
of indeterminate forms. This will be discussed in Section 6. 

In the following, when the word mean is used, it will designate the substitu- 
tive mean as defined above. 

2. Classiflcatioii of Means already made. Some general classes of means 
have already been distinguished. One important basis for a classification of 
means is the kind of data to be used. The data may be only qualitatively 
distinguishable. Then numbers may be assigned to qualities. For dealing in 
a very general way with all kinds of data, C. Gini and L. Galvani [13], and 
G. Pietra [14], distinguished between data in rectilineal series, in cyclical series, 
and in unconnected series. These three classes are associated respectively with 
the straight line, the circle, and a regular polyhedron (in three dimensions, the 
regular tetrahedron, and in n dimensions, a polyhedron with n •+• 1 vertices each 
at the same distance from each of the other n vertices). 

For one definition of the arithmetic mean of a cyclical series, Gini uses the 
center of gravity principle; and this mean is computed with the aid of sines and 
cosines. By mechanical means, such an arithmetic mean of dates — for example. 



THE SnBSTmmVB MEAN 


166 


of dates of weddings— as days of a year can be found. Ob the rinl of a wheel 
delicately suspended and marked ofif for the 365 days or 366 days of a year, let 
small weights proportional to the number of weddings on a day be placed in the 
spaces assigned to the individual days. Then when the wheel comes to rest, 
the arithmetic mean of the dates will be found at the lowest point of the rim. 
In the special case where the center of gravity of the system is at the center of 
the circle, the mean is indeterminate, or we may say that every day is a mean 
day. 

Also, for cyclical series the arithmetic mean and the median are defined by 
other methods, using such principles as minimizing the sum of the squares of 
deviations or the sum of the absolute deviations. 

The properties of means may be made the basis of a classification, either those 
properties which have been evolved by writers [8-12], [15-18] who have char- 
acterized specific means by sets of axioms, or those properties which seem of 
special importance in making distinctions. Two such properties will now be 
mentioned. 

Gini [19] recognizes two large classes of means: “A) medie ferme, B) medie 
lasche,” the latter (loose) class including the median and mode for which values 
do not depend upon all the data. To describe this latter mean m of arguments 
Xi , we might write dm/dxi = 0 as applying to several if not most of the argu- 
ments over wide ranges instead of at isolated points. 

Subclasses of A or firm means as given by Gini will be discussed in Section 4. 

Another rather large classification distinguishes between simple means and 
their weighted forms. In a case often encountered, where the weights are 
whole numbers indicating frequencies of occurrence this distinction is of little 
significance. In the more general case, however, where weights may give ratings 
of the efficiency of measuring instruments or the weights may be negative [6, 
20], more direct attention needs to be paid the weighted forms. 

To supplement classifications already proposed, 1 am indicating in the next 
section a descent from the substitutive mean, the most general of all means, 
down through two classes of means less general, which I am calling the summa- 
tional mean and the quasi-arithmetic mean, to the more specific mean known 
as the associative mean, studied in particular by M. Nagumo, [21] A. Kolmogorofif, 
[22] and B. de Finetti, [2]. 

The foregoing subclasses of the general or substitutive mean are based 
primarily on structure, the way the mean is formed. 

3. The Summational Mean, Quasi-Arithmetic Mean, and Associative Mean. 

The summational mean, now to be defined, is a generalization of the weighted 
arithmetic mean. 


W = •!••••+ CnX n 

Cl + c* + • • • -h c. 


(3.1) 


2c< 0. 
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It is to be noted that although W is not a symmetric function of z,- , IF is a 
symmetric function of c,Zi . In the generalization Q, the following features of 
IF are retained: 

1. Certain weights Ci being given, Q is a symmetric function of CiXi . 

2. This Q may be determined from sums of n terms, each term involving 
one and only one z< . 

Definition. Let S denote a summation for i = 1, 2, • • • , n. Suppose that 

(3.2) F\y, , y), Ifiicai ,y), ■■■ , ^fkicix, ,y)} =0 

has a solution, y = Q which is a substitutive mean of Xi , xt x„ . Then Q 
will be called a summational mean of Xi, x%, ■ ■ ■ , x„, relative to the functions fi , 
fi, ■■■fk, and F. 

Sometimes it is possible to express Q as 

(3.3) Q = G{lgi(ciXi), Sj; 2 (c.z,), • • • , 2fifit(c,x,) 1 . 

Among summational means, those of mo,st frequent use involve in a special 
way but one summation. Thus with i^(z) a function, which would usually bo 
taken as continuous, this m satisfies 

(3.4) ^(m)2c.- = Sc.<Kz.). 

But this, with c< > 0, is just an algebraic analogue or prologue to the First 
Theorem of the Mean for integral calculus— the c, to be replaced by a positive 
integrable function. Without further specification, this mean m may have an 
uncountably infinite number of values. But if it be required that ^{x) be a 
continuous increasing function, and that c,- > 0, then m is unique. 

In a series of papers, C. E. Bonferroni [20], [23-27] used means such as m in 

(3.4) for statistical and actuarial problems. And, as he had in mind [28] dis- 
tinctly the notion of substitution, he was in a sense a forerunner of Chisini. 
E. L. Dodd [29] made use of a mean m defined with the aid of n continuous in- 
creasing functions ^i(x), thus: 

(3.5) Sc,V'.(»i) = i:c,4>i{xi), a > 0. 

If Qiix) = cetnix), this can be written 

(3.6) 'S^i{m) = Sflr,(z,). 

In one paper, C. E. Bonferroni [20], as already noted, ustxl weights which 
might be either positive or negative. 

Some such mean as m in (3.4) has been used by a number of writers. Here 
^{m) is a weighted arithmetic mean of 4i{x ^ ; and thus it is natural to call m a 
quasi-arithmetic mean of Z{ . 

Definition. Let Sc,- 0. If m is a solution of 

(3.4) l^(m)Sc.- = Scfl^(z<), 
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then m will be called a gwm-anlhmetic mean of Xi , rviih weights c< , and rdaHoe to 
tile function ^(x). 

Sufficient conditions for the existence of this mean wt are: (1) That ^{x) be 
continuous in the interval I, finite or infinite, in which the observations Xi lie; 
(2) That either Ci > 0 for each i, or that ^(z) take on all real values, as z runs 
through 7. 

It will be helpful to picture geometrically the double transformation or mirror- 
ing represented by (3.4). Points z,- on the horizontal axis are carried vertically 
to the curve y = 4>{x) and then reflected horizontally to the y axis. For the 
points yi , on the y axis thus obtained the arithmetic mean y or “center of 
gravity” is obtained. Then y is carried horizontally to the curve and reflected 
vertically to the z-axis. The abscissas m of points on the z-axis thus obtained 
are means of the given z,- , relative to this ^(z). 

It may happen (Dodd [3 p. 746]) that the curve y = ^(z) contains horizontal 
segments, as in the curve for temperature y of ice-water-steam which has ab- 
sorbed a quantity z of heat. In this case the mean m may be an “interval,” 
an uncountable set of real numbers. Indeterminateness over an interval is a 
well known feature of the median of an even number of variates. In fact, a paper 
of D. Jackson [30] was for the purpose of indicating one method of selecting a 
single value from this interval of indeterminateness, as a median. 

It may be noted that a mean of n variables becomes, when n = 1, a function 
of a single variable; and thus it appears possible to implant in a mean of n 
variables almost any peculiarity found in a function of one variable. 

A special case of the quasi-arithmetic mean is the associative mean m which 
under some general conditions has been shown [2, 21, 22] to satisfy 

(3.7) ni>{m) = S^(z,), t = 1, 2, • • • , n; 

where ^(z) is a continuous increasing function. 

If fni^i , x* , • • • , z„) is an associative mean, then by definition, /„(zi , 
Xt, • • • , z„) is unaltered when any k of the n variates are each replaced by the 
mean fk of that set. 

4. The Gini means as summational. Having distinguished firm means from 
loose means, Gini [19] noted that in the former class, a variate might appear as 
a base, as an exponent, or both as base and exponent. In general, these variates 
are to be positive. Gini then listed ten means of a decidedly broad character, 
some of them generalizing the combinatorial means treated by A. Durand [31] 
and 0. Dunkel [32]. See also G. Pietra [37]. 

These ten means involve only the four simple arithmetic operations and root 
extraction. For many purposes they are best expressed in the form given by 
the author. However, to show that these means are summational, logarithms 
will be rised to reduce products to sums. 
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Let 

-S” = Xxf t = 1, 2, • ■ . , n; 

hCo = n!/cl(n — c)!, a binomial coefficient; 

Pe be any one of the «Cc products of c different elements taken from 

(4.1) ®i , »i ; 

Pc “ (^e)^ the p*** power of Pe ; 

Ze — ZPe , the sum of all the „Cc products Pc ; 

Zf = SPf. 

In the expressions which follow, it is assumed that the denurainatoi-s are nut 
zero. 

The ten means, as defined in Gini’s Equations I, II, • • ■ , X, will be designated 
here by mi , m* , • • • , mw ; and their logarithms, with base arbitrary, will now 
be given. 

log mi = (log S’" — log n)/p 
log mt - (log Z„ - log nCc)/c 
log m* = (log Zf - log nCc)/cp 
log m - (log -S" - log -S*)/(p - q) 
log m* = Sxf log Xi/S’’ 

(4.2) 

log m* = (log Zc - log Zd - log nCe + log „Cd)/(c - d) 
log nh = (log Z? - log Z^ - log „Cc + log nCd)/(c - d)p 
log me = (log Z." - log Z?)/c(p - g) 
log m» = SPt log PdcZl 

log mio = (log Zc - log Zi - log „Cc + log nCd)/(cp - dq). 

As noted by the author, the foregoing include some well known special means. 
Thus, mi is the power mean, which for p = 1, 2, — 1, becomes respectively the 
arithmetic mean, the root mean square, and the harmonic mean. If p — ► 0, 
then the limit of m* and of m? is the geometric mean. If p = 0, 1, 2, and q = 
p — 1, then m 4 is respectively the harmonic, the arithmetic, and the contra- 
harmonic mean. 

For each of the ten means, Gini gives an appropriate name. Those involving 
binomial coefficients are combinatorial, a mean like the contra-harmonic with 
denominator other than a constant is biplanar, the more simple means 
monoplanar. 

When in the following, I show that certain combinatorial expressions may be 
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refdaoed by sums, it is not implied that this replacan«it would shapfify 
computation. 

To prove that mi, mt, • • • , mio are all summational means, it may be noted 
that n, p, q, c, d, «Co , and nCt are constants. Moreover, is the symmetric 
sum of the pth powers of xt, thus with only one xt in each tom, and 
t 3= 1,2, • • • , n. And, since Zc,Zt ,Zd, and Zi are symmetric polynomials in 
the Xi , they may be expressed as polynomials in S*, S*, • • • , by a well known 
theorem of algebra. Hence among the ten means, the only one that requires 
special attention is the ninth mean, m » . 

To show that m« is a summational mean, we need only examine the numerator 
of the right member. Let this numerator be N. 

(4.3) SPllogP.. 

Then 

(4.4) qN = (*’4 • • • ®2)(log *1 + •••+ log *?) + ••• . 

Thus, if we set y, = xi , we may write 

(4.6) qN = {yivt ■ ■ ■ y.)(log yi + • • • + log y«) + • • • . 

The coefficient of log yi in this right member is the sum of all products of c 
different factors which include yi . 

Now, let Yr be the sum of the products of r different factors taken from 
Vi t Vi ) • • • t Vn and let T, be the sum of the products of r different factors 
taken from yt, yt, ■ ■ • , yn • Then it is evident that 

(4.6) Yr^Tr + yiTr-l : Tr^Yr- yiTr-i . 

If, now, we set Fo = 1, it follows that 

(4.7) Tc-i = Yc-i - yiF._, + y\Y^ + (-l)‘-Vr*F, . 

Hence, in qN, the coefficient of log yi is 

(4.8) y,r,_, = yiY^i - yjF._, + ... + (- l)‘-‘y?Fo . 

Thus in qN^ the terms containing log y\ are 

(4.9) Fc_iyi log yi - Yc^\ log yi + • • • + (- l)yj log yi . 

Now let 

(4.10) Ur = Syl log y< , » = 1, 2, • • • , n. 

Then, 

(4.11) qN = Yr.iUi - Yc-tU, + • • • + (-l)-‘Fot;. . 

Thus, qN is here constructed from sums of n terms with but a singh yt in miy 
term. 

Likewise, with y< replaced by z* , a term contains but a single xt . 



170 


EDWABD li. DODD 


6. Transformations. A function f(xi , Xt, ,Xn) is not in general a mean 
of its arguments X{. However, it is often possible to make a substitution 
xt = 0(y<) BO that 

(5-1) Mivt), ■ , •t’ivn)] = g{yi ,yi, • ■ , y.), 

is a mean of its arguments jfi . 

The required substitution is sometimes obvious, as in tlie case of the estimate 
8 of scale 

(5.2) 8 = l(l/n)S(a:.- - fn)T = [ii/n)Xy]f\ 

Here 8 is a mean of y,- , although it is not a mean of Xi . 

Definition. Let y = 4'ix), in general multiple valued, be defined in an in- 
terval I, finite or infinite, the values of y lying in an interval J. Suppose that for 
each y in J, there is at least one x in I such that yp{x) = y. Let any such x be 
designated by 0(y). Then 4>(y) will be called the inverse of \f/ix). It follows that 
one value of 

(5.3) my)] = y. 

Theorem. Let 

(5.4) 2 =/(*!, Xj , x„), 

in general multiple valued, be defined when each Xi is in some interval /, finite or 
infinite. With x in /, set 

(5.5) 4^{x) ^f{x,x, ••• ,x); 

and suppose that y = ^(x) has an inverse, x = 0(y) defined in J. Let x* == 
thiVi) be substituted into f to form the function 

(5.6) w = myi), <t>{y 2 ), • • • , <^(y»)] = y(yi , y» , • • • , y»). 

Then w is a mean of yi , defined when yi is in J. It is thus a mean of ^(x»), 
where Xi is in 1. 

If further, ^(x) is a continuous increasing function of x, then for a given set of 
Xi , the values of z and w are identical. The same is true for a given set of n values yi . 
Proof. If each yi - c, a number in J, then 

(5.7) myi), , <t>(.yn)] = /[0(c), • • • , 0(c)] = 0[0(c)]. 

And one value of 0[0(c)] is c, from the definition of the inverse function 0(y). 
Moreover, if a number c' is taken in /, then 0(c') is some number in J, which 
we may call c; and the argument above is applicable. Finally, if 0(x) is con- 
tinuous and increasing, then a number Xi in / is associated with one and only 
one y< in J; and vice versa. Thus w and z become identical. 

In the forgoing, we started with / which is not a mean of its arguments x< , 
and obtained g which is a mean of y, . Something like the reverse of this is 
possible. The last member of ( 5 . 2 ) is a mean of yi . It was obtained by treat- 
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ing m as a constant, with respect to . If, however, m is an estimate for 
location and is taken as {\/n)'&Xi , and this is substituted into (5.2) then 

(6.8) « = {[(n — l)/n]2aj* — (2/n)Sa:<iy)*'*, » <j. 

This 8 is now not a mean of ; for if a; equal any (xmstant c, then 8 » 0. 
Furthermore, there exists no single valued continuous incresang function x » 
0(y) such that if X{ b 0(^{) is substituted into (5.8), 8 will be a mean of the 
Vi . Thus the elimination of m from (5.2) interferes with the status of 8 as a 
mean of the xt . 


6 . Indeterminate Forms that arise in testing for Means. Sometimes a func- 
tion / is substantially continuous. But the investigation leading to the func- 
tion fails to assign to the function a value for certain values of the argument x, 
or arguments, Xi , X2 , • • • , Xn . However, values are often assignable which 
will make the function continuous. This is the usual occurrence when, in curve 
fitting, parameters are estimated. In general, the measurements are assumed 
to be not all alike. However, when a general function such as 2x,/n for loca- 
tion is obtained, we do not hesitate to assign to this function the value c when 
each X{ — c, to make the function continuous. 

As another illustration of “indeterminate forms,” consider the Jackson [30] 
median, Af, of four numbers Xi ^ xj < Xs ^ X4 , viz., 

( 6 . 1 ) M = (X4X, - xtxi)/(x4 -f- xs - Xj - Xi). 


A direct substitution of x = c, renders M indeterminate. But if x,- -+ c, 
indeed, if merely x* — > c, and xj — > c, so also does M, 

In a recent paper, R. Cisbani [33] generalizes means suggested by Dunkel 
[32] and L. Galvani [34] by setting up 

(6.2) j/,(x) == I n ‘ 2 (o' + j . i 0, x 5»^ 0; 


and letting n — > oe , There results an int^ral with the value 


(6.3) 


... r T’ 

IM+ l)(b'-ooJ 


for the ease, x 9 ^ j. This mean set up as a mean of an infinite number of variates 
turns out to be also a mean of the two numbers o and 5, — which for 5 b a be- 
comes indeterminate. But as b approaches a, so also does §j(x) approach a. 
This is also true for the special cases x = —j, etc. 

In testing to see if a function m of X{ is a mean of these numbers, a diflSculty 
sometimes arises, because a substitution of x.- = c and m e into the equation 
which implicitly defines m will put zeros into denominators. An aid in such 
testing will now be formulated as a theorem, although the ideas involved are 
not essentially new. 
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Thsobem. Let f{x) be a continuous increasing function of x defined for Mek 
real x. Let 

(6.4) /(O) »= 0. 

Gioen n real distinct numbers 

(6.6) *1 < *1 < • • • < Xn-l < Xn , 

n positive numbers, kt , and a real number C. 

Set 


( 6 . 6 ) 

Then F{x) 
(6.7) 


Fix) = 


ki 


+ 


K 


- C. 


/(Xi - x) /(x, - x) 

0 has n — 1 real roots m,- , such that 
Xi < mi < x» < nit <•■ ■ < nin-t < x» ; 


also, a root less than xi , provided 

(6.8) Sfc.//(+<*) < C; 
or a root greater than x# , provided 

(6.9) 2ki/f{-«>) > C. 

Pboof. Since /(x) is a continuous increasing function of x, so also is 
ki/fixi — x), except for the single value, x = x, . So also, then, is F{x), except 
when X = Xi or x* or > • • or x„ . But 


(6.10) F{Xi + 0) = - 00 ; F(x<+i - 0) = + oo. 

Hence, between x< and x,+i , there exists a root m* , of F{x) = 0. 

Moreover, since 

(6.11) F(- 00 ) = [2A<//(+ 00 )] - C; F{xi - 0) = + oo ; 

it follows that there is a root less than Xi , provided (6.8) is satisfied. Likewise, 
there is a root greater than Xn if (6.9) is satisfied. 

The use of this theorem in testing for means is simple. Keeping the x< dis- 
tinct, the equation F(x) = 0 determines (n — 1) numbers, my , such that if 
Xj c, so also do these m,- — » c. Employing continuity to define m,- when each 
Xi = c, we may say that each mj is a mean of x,- ; ji = 1, 2, • • • (n — 1); » = 
1, 2, ... n, when the conditions of this theorem are satisfied. If F{x) » 0 has 
still another root, m, this m will not in general be a mean of xy . 


7. Summational Means arising in the Estimation of Parameters of Frequency 
Distributions. In curve fitting, the estimation of parameters leads in general 
to summational means. If the method of moments is used, the first step is to 
find the moments by summation. I have already considered estimates for 
location and scale by this method [7], and by the R. A. Fisher method of maxi- 
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mum likelihood (4]. A further study of the results of the likelihood method will 
now be made. 

By this method, products which first appear are reduced to sums by log> 
arithms, and the means found are, in general, summational. Some idea of the 
forms of these means can be obtained by examining a rather general fonn of 
frequency function which includes the Pearson Type I, and involves parameters 
with estimates p > 0 and 9 > 0 , in addition to the location m and scale a. 
Let the observations bexi , xt, •• • , x„ ; let 


(7.1) 

(7.2) 


(*.• 


V = 


m)/a; 0 ^ ^ 1 ; a > 0 ; 

1 r(p + g) 

a T(p)T{q) 


- ty 


The likelihood L is obtained by multiplying together the n factors obtained 
by substituting t = h , k , • ■ ■ , tn . 

Then 


(7.3) 


fog L =« — n log o + n log r(p + q) — n log r(p) — n log r(g) 

+ (p - 1 ) £ log << + (g - 1 ) £ log (1 - U). 


From dLldtn = 0, there is obtained 

(7.4) PS + Q2 i— - =0; P = p - 1, Q = g - 1. 

Xi — fit Xi fit — a 


Suppose P ^ 0 and Q 0; and as a first case, suppose P + Q 9 ^ 0. If each 
X{ is replaced by x, the above equation leads to m = i — (Pa)/{P + Q). 

Then m is a summational mean of 

(7.6) x\ = Xi- iPa)/{P + Q) t = 1, 2, . . . , n; 

as seen by applying the Theorem in Section 5. 

Likewise, a is a summational mean of 

(7.6) Xi' = {Xi - m){P + Q)/P. 

If P 5 ^ 0, 0 7^ 0; but P + Q = 0, then (7.4) becomes 

(7.7) S = 2 — — . 

Xi — m ~ a Xi ~ m 

Now set y i Xi — tn,C — 21/p< ; and write (7.7) as 

(7.8) F(a) = 2 C - 0. 

Vi — O 

This has the fmm given in (6.6) with x replaced by o, 1, /(a) « a. If fhen 

Pi < Vi < ■ ■ ■ < Vn , there exist (n — 1) solutions of F(a) v 0 betwemi Vi 
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and tfn - And thus keeping the distinct, if — » c, so also do the a/ — » e. 

These o,- are then means of yi , and thus, means of — m. 

In the more general case where P + Q 0, it is seen also that Q is a summa- 
tional mean of 

(7.9) 

l^Xi — m J 

From dL/da = 0, quite analogous results are obtained. The special case 
now, however, is given by P + Q + \ = 0 = p + g— 1. And, with the 
continuity interpretation, a is a mean of Xi — m; and moreover, m is a mean of 
Xi — a. 

Using now the digamma function 

(7.10) f(u) = ^ log r(M), 
set 

(7.11) D(p) = f(p + g) - f(p). 

The condition dLldp = 0, then leads to 

(7.12) Z)(p) = (l/n)S(-log U), 0 < g 1. 

Now, with g > 0, D(<») = 0, D(— 1 -h 0) = «> ; and D(p) is a continuous de- 
creasing function of p, when p > — 1. Then, since — log U > 0, there is a 
unique p > — 1 to satisfy (6.12). 

To be useful, here, p should be > 0. But, at all events, the p thus found is 
a mean of £)“*(— log <<), where Z)"’ is inverse to D. 

The digamma function (7.10) appears also in estimating the parameters for 
the Pearson Type III. 

(7.13) p = ^ ^ < = (af - m)/o, p > -1. 

By setting bLjb'p = 0, it is found that m is the mean of ; 

a is the arithmetic mean of {Xi — ; while p is a summational mean of 

f~Mlog (x» -- m)/o} — 1, where is the inverse of f. From dLldm = 0, it 
is found that w is a summational mean of Xi — pa; a is the harmonic mean of 
(xi — m)lp; and p is the harmonic mean of {xi — m)/a. Finally, from dL/da = 
0, there is obtained 

(7.14) (l/n)lxi == m + a(p + 1), 

which makes m, a and p each an arithmetic mean of a simple function of the 
observations Xi , when the other two estimates are taken as constants. 

Comparison of (5.2) with (6.8) has shown that after complete elimination, 
estimates may cease to be means. However, it may be noted that a is more 
frequently exhibited in the form (5.2) where it is a mean than in the form (5.8) 
where it is not. 
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8. Generalizations. The extension of results from the (fiscrete or discontinu- 
ous case where a mean m depends upon only a finite number of elements to the 
continuous case is fairly immediate, with integration taking the place of summa- 
tion, and a distribution or frequency function taking the place of Secrete weights, 
Ci . Stieltjes and Lebesque integrals may be used as well as Riemannian. Such 
a generalization of the Chisini mean was given by de Pinetti [2]. 

The summational mean, which I have defined as involving possibly several 
summations, may be generalized likewise. 

In terms of set functions, sometimes called fxmctionelles, I gave [36] the fol- 
lowing general definition of a mean with a point set H in mind as a distribution 
function. 

Definition. Let E and H he sets of numbers. Such a number t may he a real 
number or a vector number t = (<i , , • • • , 

Let Et he the result of replacing each number of Ehy a single number t. 

Then the mean m of numbers in E, relative to the set Hj and to a function /, is 
given hy m ^ f(Ej H ) ; provided that the function f has been so constructed that 
for each t in £7, f(Et , H) tj or at least one value of this f is t. It is to he under-- 
stood above that when E is changed to Et , the set H remains unaltered. 

This retains the chief feature of /(^, t, , t) == t in explicit form or of f(t, 
^ • 7 0 = f(l'i 7^7 * • • 7 ^n) in implicit form, where t is a mean of , <* , • • • , in . 

I used [36] a somewhat less general definition to discuss regression coefiicients. 
All such means may well l)e called suhstitutive or representative, 
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THE PRODUCT SEMI-INVARIANTS OF THE MEAN AND A. 
CENTRAL MOMENT IN SAMPLES 

Bt CsciL C. Craio 

The method developed by the author for calculating the semi-invariants and 
product semi-invariants of moments in samples from any infinite population' 
is not immediately applicable to the calculation of product semi-invariants of 
the mean and a central moment in such samples. In the present paper this 
method is adapted for this purpose so that the calculation of these product 
semi-invariants becomes routine. As it will be seen, the computing is a little 
heavier than in the case of central moments alone for results of equal weight. 
A table of results up to weight ten for the mean and the second, third and fourth 
central moments is given. The author plans to apply these to a further study 
of the sampling characteristics of the coefficient of variation and Fisher's t in 
samples from non-normal populations. 

Let a random sample, zt , Xt , • ■ • , Xm of N observations be drawn at random 
from an infinite population characterized by the semi-invariants, Xi , Xj , X* , • • • . 
The sample mean is, 

x = E Xi/N, 

and the n-tli central moment of the sample is 

w« = (xi - 

<-l 

Then the product semi-invariants of order kl of x and rrin , Sjci(x, nin), are defined 
by the formal identity in the parameter ^ and w: 

(Siot> + Soic*)) + + SoKaY^^ 

1 

+ ~ (S«» + &.»)“’ + . • . - log 

in which E denotes the mathematical expectation over the set of all such 
samples and 

(5iod + 

^ "An Application of Thiele’s Semi-invariants to the Sampling Problem;’’ Metron, Vol. 
VII, part IV (1928), pp. 8-76. 
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If we denote by Mki , we have by definition the further formal identity 

in ty and co: 

= 1 + (Miod + Afoio,) + i {Mr,d + + • • • 

in which (Afio«> + is to be expanded in the same manner as 

(Sio*? "b above. 

Let us write 


and then 

( 2 ) 


di = Xi — X, 




(Summations with respect to i and j always run from 1 to N,) Now we define 
a new set of product semi-invariants, Xra<..., of the sum Sxi and the N 6i's, by 
means of 


(\\od + SXoiWt) + 21 (^10 + ilXotWt)^*'' -[-•••= log 

in which for example, 

( 8 \( 2 ) 

Xiot^ + XoiW,- 1 == X2ooot?" + 2Xaoot5^ci)i 

+ 2 XioiotJ^W 2 + • • • + X0200CU1 + X0020W2 + Xooo2wL 

We may set 

o.,= -i, i^j 

6 i = 2 ^ (UiXi with < 

_ JV - 1 

tt« 

Then 

in which 

tti = + 22 ffli/wy- 

i 

It follows then that 

(Xiot^ + SXotO),) + ^(Xiot> + SXoiWi)^*^ 

+ (Xioi> + SX(KW<)^*^ + . . . s XiSa» + X2^^ + Xa + • 
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from which 

(XlOt> + — Xjb+I £ + 53 

i i 

From this 

Xifc00**«0 “ Xjfco ~ y 
X/fcio.-o = X*oio.“0 = • • • =0, 

and generally/ 

(3) - 1)"] (U + U+ +h=^l). 

This is the first result to be used in calculating values of Stts. Note that the 
value of is independent of the order in which a given set of I/s occur. 

Calculation of particular in terms of N and the semi-invariants of 

the sampled population is both simple and rapid as one may see from a pair 
of examples: 

Xk = Xjo* = XaoM = • • • 

(suppressing superfluous zeros in the subscripts) 

= [(JV - D* +iN- 1)1 = X4 . 


Then, too, 


Xit2 



Xjlr.,.2. 


For a second example: 

X*4^ = ^0 [-{N - D* + iN- D* - (N - 2)] 

{N - 2)(iV* - 3N -f- 3) . 

= A*+7 . 

Now the semi-invariants, Ski , can be expressed directly in terms of the 
product moments, Vkhi^.-^iN of the sum 2fc, and the iV5’s. These product mo- 
ments are given by the appropriate moment generating function: 

= 1 + (viot? -h Sro.«i) -f- (r,od -|- + • • • . 

* As written this result is valid if at least one of the /,*s is zero which is always the 
case if N, the size of the sample, is greater than L (Cf. the author’s paper cited above, 
p. 17.) 
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Then it is seen that, 

^(g(**<>»+(S*'5)") _ J ^ ^ (Zfo.Bi)*!*] + i [viod + (Sl'o.ni)")* + • • • » 

in which 

= v*)^* + 2(vi„ + VlQn + Vmn + * * ’ + (^OJn + ^'00.2n + l'000,2n + • * • 

etc. and by comparison with (1) and (2), wo have 

(Siod + Soiw) + ~ (Sio^ + SoioiY^^ + • • • 

= log |l + ^ [vio«J + (2»-o.n.)«] + 2 ,^^-, + (Svo.,.)«]"’ + • • - j 

From this 

(Sio^ + 

1 V *(p— 1)!(A + + (S*'0,n»)w]’^([l'10t^ + (2vo,m)a>]^^^ }*• • • 

- “ 711 ^ 2 !)• • • • H 

in which 

r + 8 + t + ••* = p, 

the summation extending over all partitions (r2"3^ • • ) of fc + J. This, of 
course, is only the usual formula for semi-invariants in therms of moments appro- 
priately modified. In particular, 

(Slot? + SoiO))^^^ = ^2 {[vi 0 t> + (2ro,m)w]^*^ — [piol^ + (Sro,ni)wf 1. 

If we write 

blOt? + (Svo,nt)w] = W 

^ - 3Tr'*>tr + 2Tr’) 

+ Sno>y*^ = 4 ITT'*’ - 4TF‘"Tr - 3(lf '*>)* + 12W"'Tf* - 61^*]. 

N* 

Now the can be replaced by their values in terms of the Xti,j,...jjv’s, 

the details of which will be explained below, and it will be evident that any 
Vkiiif-iN is unaltered by a permutation of the U’a in its subscript. Taking 
account of this, the formulae (6) may be written in the expanded forms: 

SniS, Mn) = ^ [vin — VloVOn] 
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Su(i, ntn) * ” >W' 0 n - iPlnVU + 2l>tt)Vo»] 

Sit{£, nin) = [j'l.Jn + {N — l)l<l«» — I'IOI'o.Jb — (N — l)l'l0»'0nn 

- 2N»in»ltn + 2A^»’io»5,]. 

But, with no loss in generality, the origin may be taken at the population mean 
so that Xi s 0. In this case it will be found that ru — 0 and these formulae 
become: 

-Su(«, TO,) » Vln/N 

TO,) = [i»i„ — 

iSw(f, TO,) = [Pi.!, + (A^ — 1)|'1„» — 2NvinPon] 

Su(x, TO,) = jjj [vi, — VtilPOn — 3l»lnl^] 

® . . . 

6a(^i TO„) = ^ [va,j„ + (AT — l)«'a„» — 2Ar»j„»<0M — pupom 

— (AT — l)l'»j'0nn — 2Npu + 2Ari»»i»J,] 

<Sl 3 (x, TO,) = [ri,8n + 3(A^ — I)!*!.*,,, + (A^ — l)(Ar — 2)('l,n, 

— 3Ar»'i,jnV0n — 3N(N — l)l^,„»'0n — 3A^»'1,»'».1» 

— 3A^(A^ — l)*'ln»'0nn + 6A^*»'l,»'0,]. 

These formulae are the second result used in the actual calculation of 
Sti(£, TOb)’s. One begins with them, putting in the particular value of n for 
the central moment in question. If for instance we wish to compute the product 
semi-invariants of the mean and variance in samples of N, we begin with the 
set of formulae: 

<Su(*, TOt) = Pit/N 

TOj) = [rn — 

(Su(*, ”h) = ^ [»'M + {N - l)vi« - 2Ar»'«»«w], 


The second step is to replace the product moments which appear by 

their values in terms of the corresponding product semi-invariants. This process 
can perhaps be best explained by some examples. 
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Consider the complete calculation of Sm(x, mj). From the expression for the 
fifth central moment in terms of semi-invariants: 


= X» + lOXiXj , 


we can write the corresponding expression for product moments in terms of 
product semi-invariants 


Then we get vu by comparing coefficients of j—* and um by comparing coeffi- 
f5^ 

cients of ^ in this identity. For an index as low as 5, these coefficients 

A I 

are readily picked out by inspection; for larger indices the use of Hammond 
operators reduces this to a mechanical routine.' 


’ In this case we have 


DsDt(U) = (12)(02) -b (03)(11). 


To the terms on the right the appropriate binomial coefiicients must be applied 
giving 

3(12)(02) + 2(03)(11). 

5' 

The total of these coefficients is 5 = jj , a necessary check. Then multi- 
plying these coeflScients by 10/5, we have 

6X12X02 -|- 4X02X11 

for the required coefficients in the second term in (8). Thus 


I'M = Xi4 + (6X12X02 + 4X02X11). 


The two terms in parentheses arise from the same term in (8) and would both 
give rise to terms in X2X2 in the final result if Xu were not identically zero from ( 3 ). 
In practice all terms in which Xw is a factor arc crossed out as they appear. 
Next 


£> 2 /> 2 ( 122 ) = 2 ( 12 )( 02 ) - 1 - ( 111 )( 011 ) + 2 ( 021 )( 11 ). 

(X 002 = X 02 ; X 012 = Xo 2 i.) With the binomial, or multinomial coefficients attached, 
the right member is rewritten 

6(12)(02) -I- 12(111)(011) + 12(021)(11). 


'Cf. the author, loo. cit., p. 24. 
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The total of these coefficients is 30 
cient by 10/30, we have 


5! 

2!2!lV 


Then multipl 3 ing each co^« 


vm = Xi» + (2 XuXm + 4XmV»i + 4XoijXu). 
Going on with the calculation of «Su(f, m)’ 


I'll = Xu , — Xot , 


and then we have: 

Sn(x, rrit) = [{Xi4 + (iV — l)Xm) 


+ {OXijXm + (A^ — 1) ( 2 X 12 X 0 ! + 4XmXou) ~ 2ArXitXo*)]. 


The first set of terms within braces gives rise to terms in Xo; the second to terms 
in X»X* . Next 


Xu = 

Xm = 
Xos = 
X 021 = 


(N - 1)(JV’ - 3N + 3) ^ 

■ “ N* 

Xm = 

2Ar-3. 

Xos = 

N» 

(AT - l)(Ar - 2) . 

‘j 

Xou = 


JN-2) 


— 

N 


N-1 

"N 


Xt 


— 

N’ 


This table of values will be of frequent use in further calculations of Su'a. 
Giving the values of both Xm and Xon here, was unnecessary duplication. 

Now only the final reduction is to be carried out. We obtain 


Sa{£, m*) 


N-J. 


[{N — 1)X6 + 4iVX3Xd. 


This result of order 3 and of weight 5 follows a quite mechanical procedure 
and is quite brief. The length of the algebraic computations required grows 
rapidly as the weight is increased but for weights no greater than 10 undue labor 
is not required. For greater weights only time and patience is required to get 
results if they are needed. It is to be noted that by this method one may 
calculate individual terms in the result without doing any of the work required 
for the remmning terms and that one may readily shorten the work by getting 
results to a desired degree of approximation with respect to powers of 1/N. 

There follows a table of the results so far calculated. 
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For n = 2: 


Su = 
5.1 = 
-S« = 

<Sm = 


1 

jf- 

N-1 

N* 

N- 1 
N* 

N-1 

N* 


X* 

X4 

[(AT - DXs + 4JVX,X,] 
Xs 


Sn — [(A/^ ~ 1)X« + 4N(\t\i + X*)] 

Sn = [(AT - 1)*X, + 12N(N - 1)X*X2 + 4Ar(5Ar - 7)\,\» + 24Ar®X,X*j]. 

iv* 

It is not difficult to see that in general 


wi») 


N 

jv*+‘ 


Xt+s . 


For n = 3: 

. _{N- im - 2) - 

*U X4 

„ {N - 1){N - 2) , 

Aj 

Sa = [(N -DiN- 2)X7 + mN - 2)\,\z 

+ 27N(.N - 2)X4X3 + 18 Ar'X,x|] 

o _ (AT - l)(Ar - 2) , 

Sn Xe 

Sn = 2) [(JV -l)(N- 2)X, + 9mN - 2)X,X* 

+ 36N(N - 2)X»Xs + 27Ar(Ar - 2)X? + 18Ar*X4Xj + 36Ar’x;Xi] 

8„ = [iV(Ar - l)*(Ar - 2)*X,o 

+ 9(Ar - 1)(3A^ - 12 Ar* + 12 Ar* - SAT + 5)X8Xj 
+ 27Ar(4Ar‘ - 21Ar* + 36iV* - 20Ar + 3)X7X, 

+ 27N\N - 2)\7N - 11)X,X4 + 5iN*{N - 2)(4Ar - 7)X,Xj 
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+ 27iV*(JV - 2)*(4iV - 7)Xl + 64JV*(JV - 2)(7»N - 80)\|X,X| 

+ 1&2N*{N - 2)(6Ar - 12)XjX, + 64iV*(20JV* - 126JV + 140)X4X; 
+ mN*m - 12 )X 4 XJ + 824Ar*(6Ar - 12 )XlxJ]. 

For n i" 4: 

8n « [(N* - SAT + 3)Xi + QNiN - l)X,Xj 

8n = [(JV* -m + 8)X, + 6N(N - + Xj)l 

Sn = m - 1){N* - ZN + 3)*X, 

+ 4N(N* -ZN + Z){7N* - 18N + 15)XtX, 

+ iN(N* -ZN + 3)(19Ar* - 662V + 68 )X,X« 

+ 42V(29Ar‘ - 1952V’ + 6372V* - 6392V + 351)X,X4 
+ 122V*(172V* - 712V* + 1172V - 69)X,X* 

+ 242V*(352V* - 1732V* + 3092V - 189)X4X|X, 

+ 722V*(2V - 2)*(32V - 5)Xj + 962V*(42V* - 92V + 6)X,XJl 

S„ = [(2V* - 32V + 3)X7 + 62V(2V - 1)X,X, + 182V(2V - l)X 4 Xd 

Sn « [(2V - 1)(2V* - 82V + 3)*Xi« 

+ 42V(2V* - 32V + 3)(7JV* - 182V + 15)X,Xi 
+ 82V(2V* - 32V + 3)(132V* - 422V + 39)XtX, 

+ 122V(162V’ - 1062V* + 2852V* - 3602V + 180)X,X< 

+ 122V*(172V* - 712V* + 1172V - 69)X,Xi 
+ 42V(292V* - 1962V’ + 6372V* - 6932V + 361)Xj 
+ 482V*(262V* - 1262V* + 2132V - 129)X.XiXi 
+ 242V*(362V* - 1732V* + 3092V - 189)XjX, 

+ 242V*(622V’ - 3262V* + 6972V - 369)X4Xj 
+ 962V*(42V‘ - 92V + 6 )X 4 X,* + 2882V*(42V* - 92V + 6 )xjxj]. 


Tbb UNiTiBsnT or Miobigan, 
Ann Abbob, Mich. 



ON THE NON-EXISTENCE OF TESTS OF “STUDENTS” HYPOTHESIS 
HAVING POWER FUNCTIONS INDEPENDENT OF 

By George B. Dantzig 

1. Introduction. Consider a system of n random variables Xi , Xj , • • • , Xn 
where each is known to bo normally distributed about the same but unknown 
mean, and with the same, but also unknown standard deviation a. The 
assumption, ff© , that f has some specified value, , e.g. & = 0, while nothing 
is assumed about or, is known as the ‘‘Student*^ Hypothesis. Two aspects of 
the hypothesis Hq have been already studied extensively. If the alternatives 
with respect to which it is desired to test Hq assume specifically that f > {o , 
(or { < 0), then we have the so-called asymmetric case of “Student’s Hypothe- 
sis” and it is known, [1], that there exists a uniformly most powerful test of . 
This consists in the rule, originally suggested by “Student,” of rejecting Ih 
whenever 

( 1 ) < = ® > <«, 

where x and S denote the mean and the standard deviation of the observed 
x/s and is taken, for example, from Fisher’s Tables [2] with his P = 2a, 
In other words ia is such that 

(2) P\i > ta\Ho] = a, 

where a is the chosen level of significance. In accordance with the definition 
of the uniformly most powerful test, whenever any other rule, ii, offered to test 
the same hypothesis Hq has the same probability a of Ho being rejected when 
it is true, the power of this alternative test cannot exceed that of “Student’s” 
Test. In other words, if it happens that the true value of f is not equal to fo 
but is greater, then the probability of this circumstance being detected by 
“Student’s” test is at least equal to that corresponding to the rule R, 

If the set of alternative hypotheses is not limited to those specifying the 
value of f either greater or smaller than fo , but includes both those categories, 
then it is known, [1], that there is no uniformly most powerful test of the hy- 
pothesis, Ho . However in this case there exists a slightly different test, also 
based on “Student’s” criterion possessing the remarkable property of being 
unbiased of type Bi , [3]. The test, in common use for a long time, consists in 
rejecting Ho when 


M I ><« , 

186 


( 3 ) 
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with ia being taken again from Fisher’s tables, this time corresponding to his 
P — a, where a is the chosen level of significance. 

In order to describe the optimum property of this test we must use the con- 
cept of the power function of a test, [3J. Denote by |8({, a) the probability of 
the hypothesis Ho being rejected when { and v are the true mean and the true 
standard error of the observable XiS. The function /9((, <r) is just what is 
called the power function of the test. If we substitute ( « & , then we shall 
have j8(fo , (t) = a irrespective of the value of <r. Now the optimum property 
of “Student’s” test mentioned above consists in that (1) its power function 
has a minimum at ( = (o and this is true whatever be the value of <r, (2) what- 
ever be any other test of the same hypothesis which has the same level of sig- 
nificance a and has property (1), its power function /}'({, ir) cannot exceed that 
of “Student’s” test. 

These two properties, demonstrating the excellence of the criterion suggested 
by “Student,” fully justify the general confidence in the test as described above, 
or in its extended form where it is applied to two or more samples. However, 
it is known that “Student’s” test in both its forms, t > <« , and 1 1 1 > <« , has 
one very undesirable property which causes great difEiculties in various problems 
of rational planning of experiments. 

One of the most important questions to have m mind when planning an 
experiment is: What is the probability that the experiment and the subsequent 
statistical test will detect a difference or effect when it actually exists? If we 
perform an experiment and then apply some statistical analysis to test 
“Student’s” hypothesis that t = & * we do hope that, if the actual value of { 
is different from (o , the test will discover this circumstance. But apart from 
mere hope, it is desirable to take precautions so that when the difference, 
{ — {o = A, has some appreciable value, the chance of the hypothesis Ho being 
rejected will be reasonably large. This may be done by calculating the value 
of the power function /3({, <r) corresponding to the value f -f A. And 
here we come to the unfortunate property of “Student’s” test. 

Although the form of the power function of “Student’s” test is known and 
tabled [4], [5], [6], [7], there are occasionally considerable difficulties in applying 
these tables, because it appears that the values n and A are not all its arguments, 
for it also depends on a. Consequently in order to have an idea of the proba- 
bility that the test will detect the falsehood of the hypothesis Ho that | = fo 
when actually f = & + A we need not only the knowledge of n but also a 
likely value of a. The latter is known accurately only in exceptional oases and 
then in those cases one would apply a test which is different from “Student’s” 
test. Usually we have only a vague notion of the magnitude of cr and accord- 
ingly the tables of jS((, a) may be used to obtain a rough idea as to whether 
the arrangement of the experiment planned is satisfactory or not. Frequently 
we have no idea of what may be the values of a. 

To Dr. P. L. Hsu is due the idea of looking for tests, the power of which is 
independent of the parameters unspecified by the hypothesis tested. In an 
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unpublished paper, he proved among other things that the X test of the general 
linear hypothesis is the most powerful of all those, the power function of which 
depends on the same argument as that of the X test and not on other parameters. 
The above circumstances suggest the following problem: to see whether it is 
possible to devise a test of “Student’s” hypothesis such that its power function 
would be independent of a. If such a test could be devised and proved to be 
reasonably powerful then the tables of its power function could be used for the 
purpose of planning experiments. 

The purpose of the present paper is to show that no such test exists and, 
consequently, this negative result implies in still another way that it is im- 
possible to improve on the test originally suggested by “Student.” 

2. Statement of the Problem. The problem of finding a test whose power 
function is independent of cr is equivalent to finding a critical r^on w such 
that the value of the power function 

(4) /S(t,(r) = P{j5€U)|f,<r) 

for any fixed ( is independent of the value of v, where E denotes the sample 
point {Xi , Xi , • • • x„). We shall show specifically that if this is the case, then 
the power function is also independent of t; so that the test will reject the hy- 
pothesis tested with the same frequency independently of whether it be correct 
or wrong. 

3. Theorem. If there exists a region w stich that, whatever be the value of a, 

10 

(6) (via) / • " / . . . d*. * 

10 

where ^ a, are constants, then 

(7) a = /S. 

A region w is called similar [1] to the whole sample space, W, of size a, with 
respect to a set of elementary probability laws p(E | B) given in terms of a 
parameter B, if P{E eio | tf) = a, whatever be the value of 6. Essentially, 
then, the region, w, above is a similar region with respect to two different sets 
of elementary laws each being given parametrically in terms of the parameter <r. 

n 

Denote by Wr the portion of the surface of the hjrpersphere, 2 (*< “ &)* * ***> 

4-1 

which is common to w, and let the total surface be denoted by Wr . Neyman 
and Pearson have shown [1], that a necessary and sufficient condition that w 
be a similar region, in the above case, is that, whatever be r, the probability 
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m 


that the sample point E will fall on the subsurface w, , when it is known that 
the sample point ties on the surface Wr is a, i.e. 

(8) P{EeWr I iE t Tr,)(€ = &)} = a 

for all r. 

In a similar manner let Wp denote the portion of the surface of the hjrper- 

n 

sphere 2 P* common to w, and let the total surface be denoted 

by Wp . §ince w is similar to the set of probability laws indicated in (6), we 
have also 


( 9 ) 


P{E,Wp\(E€Wp)(i::^i{)\ 


for all p. 

Since on the surface Wr , the elementary probability law, 


( 10 ) 






r* 

e 


is constant, we see that an equivalent statement of (8) is that the hyper-^rea of 
Wr is a constant proportion, a, of the total hyper-^rea Wr . Similarly, from ( 9 ), 
we have that the hyper-area of Wp is a constant proportion, of the area of the 
hypersurface Wp , whatever be the values of r and p. 

Consider the transformation which expresses x\, x^, • ^ * Xn in terms of gen- 
eralized polar coordinates with pole at the point (& , fe , * • • , &), i.e. 


XI — {o = ^ cos ^2 cos ^8 • • * cos Pn-2 cos dn^l COS Pn 


Xt — & = r cos ^2 cos ^8 • • • cos Pn-* cos Pn-i sin On 


(11) 


Xi — & = r cos Oi cos • • • cos Bn-t sin 6n-i 


* Xn-\ — & = r cos ffj sin 0% 

Xn — io = r sin 0i 

Let A be the Jacobian of the transformation: 

(12) 1 A 1 = r"-‘ n cos* = r"-* m). 

i—2 

Consider also a transformation which expresses {xi,Xi, • • • Xn) in terms of polar 
coordinates, the point ({i , (i, ■ • • , (i) being pole. It may be obtained by 
replacing in (11), & by |i , r by p, and 0i by 0i . The Jacobian of this trans* 
formation is given by | A ] = 

We are now able to express the hyper-area of W, : 

f f 1 A T{ei)d»idBz .••(»»- Kr”'', 

Wr 


( 18 ) 
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where the integral £ > 0 is a constant independent of r. Similarly the hyper- 
area of Wf is where K is the same as in (13). According to (8) and 


(9) we have, now 


(14) 

j f \ d 62 d 9 z • • • dOn = a»K-r^ 

Wf 

(15) 

J J 1 2 1 d$2 d 6 $ • • • dBn ^ • p" 


Let us consider the distances between the three points: (ij , xj , • • • , x*), 
(&)&,•••, &), and ((i , (i , - ■ • , (i). The distances of the first point to the 
second point and to the third point we have already denoted by r and p. Let 
the distance between last two be L, then, since the sum of two sides is at least 
equal to the third side of a triangle, we have 

(16) r^p + L, p £ r + L, where L = \/JV | & — fi !• 

Let ^(0 S 0 be an arbitrary monotonic nonincreasing function of t, such that 
the product is integrable from 0 to + «> . Since <p(t) is a decreasing 

function it follows from (16) that 


(17) <pir) ^ <p(p + L) and (pip) ^ <pir -f L). 

Consider the integral I: 


(18) 


I = J J <p(r)dxidx 2 ••• dxn. 


We shall express it in terms of the variables r, $2 0 n and also in terms of 
P, h, • • • K and compare the results. Thus 

I — J J I A 1 <pir) drdSt ••• d$n 


(19) 


= Ar) dr j j I A I dSj • • • cW» 

v>r 

= a’K> J r" *^(r)dr. 


Also we have by (16) 

J- J/ \2(\Ar)dpd^‘--din 
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(20) -II 1 2 1 ipip -f* L) dp • • • ddn 

^ j[ ^ //|J|A...(«. 

•» 

and consequently 

( 21 ) I ^fi.K I” p’'-\ip-\-L)dp. 

Since if > 0, we have from (19) and (21) 

(22) a/fi ^ <""V« + L)di^ r“V(0d<. 

By interchanging p and r in (18), (19), (20), and (21) we have also 

(23) 0/a ^ jf r"‘v>« + L)dt/ <""V(0d<. 

Let us set in (22) and (23), ^(0 = e and (p{t + L) = where p > 0 

is arbitrary. Then 

(24) a/0 g and 0/a ^ e~’’\ 

Since (24) holds for all p > 0, let p approach zero. Then Lim = 1, and 
the above inequalities can hold only if 

(26) a = 0, Q.E.D. 

It is of interest to note that there do exist regions such that the power func- 
tion is independent of both ( and <r. For example, let S„ be the standard 
deviation of the observed values (xi , xt , ■ • • , x„) and let Sn-i be the standard 
deviation of the values (xi , x* , • • • , x„_i), then the region w given by all 
points (xi , X* , • • • Xn) which satisfy the inequality (S„-i/ Sn) S C is such a 
region, i.e. 

(26) • Pt(S»-i/S«) ^ C|f,<r} 

is constant, whatever be the values of f and a. Such regions are, however, 
unsuitable for testing “Student’s” hypothesis { = {o , because they will reject 
this hypothesis when it is wrong and when it is correct with equal frequency. 

The author is indebted to Professor J. Neyman for assistance in preparing 
the present paper. 
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A METHOD FOR RECURRENT COMPUTATION OF AIL THE 
PRINCIPAL MINORS OF A DETERMINANT, AND ITS 
APPLICATION IN CONFLUENCE ANALYSIS 


By Olav Reiebs^l 

1. Recurrent computation of all the ^inclpal< minors of a det«rminant. 

The formulae which I develop in this paper have been worked out for use in 
statistical confluence analysis. By means of recurrent computation they shorten 
considerably the amount of work required to compute all principal minors of a 
square matrix. Originally 1 elaborated this method as a simplification of one 
given by Frisch (not published). 

Subsequently I found that the method could more easily be deduced from the 
pivotal method. This method has been described, for example, by Whittaker 
and Robinson [5] and by Aitken [1]. 

Let us consider a square n-rowed matrix 



Oil 

OlS 

• • • Ou 

(1) 

Ofi 

On 

• • • 0*n 


Onl 

OhS 

• • • Oim 


Let the adjoint of this matrix be || p,,- 1| and let us denote its determinant 
value by . 

Then we have the following identity 


( 2 ) 


P»-Un-l I ^ ^ 

= Du. ■.% Du... u-i‘ 

Pn,n— 1 Pn,n 


As Aitken points out, the pivotal method is based upon this identity. 

Next consider the following matrix which is formed from the matrix (1) by 
striking out the nth row and the (n — l)tb column: 

Oil • • • <h,n-t Ot,m 


<U-I.l • • • 0«_1,»_| On-ha 
On— 1,1 * • • On— I,*— I On— l,n 

igs 
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Let US denote its adjoint by || 11» its determinant value by Au-.-n = —Pn-t.* ■ 

The determinant 


Oil • • • Ol.n-S Ol.»~l 

On^tfl • • • On-*,n-J an~2.n-l ““ I 

On,l • • • On.n— 2 On.n—l 


we shall denote by JSw. -n • 

The identity (2) can now be written 


(20 


Dli...n = 


iJl2...n~2.n-Dl2...n-2,n-l ^ 12- • -n Bl2- • -n 


Oi2 •n-2 

If we apply the identity (2) to the matrix (3) we get 

2,n— -2 Qn—2,n—l 
g»-.l,n-2 ^n-l,n-l 

which may also be written 

Ai 2. . .n— 3,n->l,n D».. . 11^2 ^12* • 8»n— 2,n ^12* • — 1 


= i 4 i 2 ...nDl 2 ...n -8 , 


(4) 


Di2**n-8 


To simplify the notation we will not write the affixes present, but write the 
affixes not present in inverted parentheses. Then our formulae (2') and (4) 
can be written 


D - 


__ D)n^l( D)n( AB 


Dyn^l,n( 


/^)n-2,n-l.n( 

In an analogous way we get 

B = -^ )n-2(-P)n--l,n( B)n~l(-4)n( 

•D)n-'2,n-l.n( 

We may apply these formulae to an arbitrary principal minor Z)tfjV 2 • 
Let us now denote Dvira-. v* by D and denote the absence of one or more of the 
numbers Vi , V 2 , • • • by placing them into inverted parentheses. We then 
have the formulae: 


(6a) 

(6b) 

(6c) 


A = 

B 




J) « - AB 
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By means of these formulae we can recurrently compute all principal minors. 
We begin with * o« , t * 1, 2 • • • n, » an , Bn an , where i < j. 
Then we compute the D’s with two affixes, 

Dn = DiJ)j — AijBn , 

and then the quantities A, B, D with three affixes, 

Am = AikDi — AikBii 

Bijk = B/kDi — BikAij 

^ DikDij — AijkBijk i ^ i ^ k. 

Dm , i<]<k. 

Then we compute the quantities A, B, D with four affixes, and so on. 

If we carry through the computations without dropping any figures we have 
as a control that all divisions will be exact without remainder. If we are 
dropping figures we can control the result by computing the determinant 
Dii...n in another way. If we wish to control the computation before it is com- 
pleted, we may use our recurrence formulae on the matrix which we get from 
the original matrix when the rows and the columns are subjected to the same 
permutation. For example we can reverse the order of the rows and columns. 
Then we can control the (fc — 1) rowed minors before computing the I:-rowed 
minors. 

If all the D’a are different from zero, we may reduce the necessary number of 
multiplications and divisions in the following way. We introduce the following 
notations: 


A»*( 

A . B 

a = 6 = — 

D)kk-un( 

^ ^ 

Substituting in (5), we get the following system of recurrence formulae: 


(6a) 


(6b) 

6 = &)•*-»( + 

(6c) 

b 

c « - - — 


(6d) 

d — 

(fie) 

D ■■ 



196 


OLAV BEIERS0L 


An affix Vk on a letter indicates the deletion of the last row and column in the 
determinants making up the definition of that letter, even though those deter- 
minants are of lower order than Vk . Similarly, an affix indicates the dele- 
tion of the next to the last row and column. 

The a^s with two affixes in these formulae are identical with the elements 
of the matrix (1) where i < j. Further, 6*y = , i < j, di =» an . Applying 

the recurrence formulae (6) we start with these values. 

If the matrix (1) is symmetric, i.e. if a,, = aji , then we get 

and 

In this case we can therefore I’eplace B by A in the formulae (6) and replace b 
by a in the formulae (6). 

Numerical example. Let us compute all the scatterances in the constructed 
example given by Frisch, [3, p. 121]. The correlation matrix in this example is: 


1.000000 

-0.121551 

0.656809 

0.7,52502 

-0.224549 

-0.121651 

1.000000 

0.657698 

-0.732862 

0.212165 

0.666809 

0.657698 

1.000000 

0.014385 

-0.040183 

0.752502 

-0.732862 

0.014385 

1.000000 

-0.280223 

-0.224649 

0.212165 

-0.040183 

-0.280223 

1.000000 

Using our recurrence formulae (6) we get the following table: 


a 

c 

d 

D 

12 

-0.121 561 

0.121 551 

0.985 225 

0.985 225 

13 

0.656 809 

-0.656 809 

0.568 602 

0.568 602 

23 

0.657 698 

-0.657 698 

0.567 433 

0.567 433 

14 

0.752 502 

-0.752 502 

0.433 741 

0.433 741 

24 

-0.732 862 

0.732 862 

0.462 913 

0.462 913 

34 

0.014 385 

-0.014 385 

0.999 793 

0.999 793 

15 

-0.224 549 

0.224 549 

0.949 578 

0.949 578 

25 

0.212 165 

-0.212 165 

0.954 986 

6.954 986 

35 

-0.040 183 

0.040 183 

0.998 385 

0.998 385 

45 

-0.280 223 

0.280 223 

0.921 475 

0.921 475 

123 

0.737 534 

-0.748 594 

0.016 489 

0.016 245 

124 

-0.641 395 

0.651 014 

0.016 184 

0.015 945 

134 

-0.479 865 

0.843 938 

0.028 765 

0.016 356 

234 

0.496 387 

-0.874 794 

0.028 677 

0.016 272 

125 

0.184 871 

-0.187 643 

0.914 888 

0.901 371 

135 

0.107 303 

-0.188 714 

0.929 328 

0.528 418 

235 

-0.179 723 

0.316 730 

0.898 062 

0.509 590 

146 

-0.111 249 

0.256 487 

0.921 044 

0.399 495 

246 

-0.124 786 

0.269 467 

0.921 272 

0.426 616 

345 

-0.279 646 

0.279 703 

0.920 167 

0.919 977 
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a 

c 

d 

D 

1234 

0.000 279 

-0.016 6 

0.016 179 

0.000 262 83 

1235 

-0.031 090 

1.885 5 

0.856 268 

0.013 910 

1245 

0.009 105 

^0.562 6 

0.909 766 

0.014 506 

1345 

-0.020 692 

0.719 35 

0.914 443 

0.014 957 

2345 

0.032 486 

-1.132 8 

0.861 262 

0.014 014 

12345 

0.009 621 

-0.594 7 

0.850 546 

0.000 223 55 


2. Computation of the coefficients of the characteristic polynomial of a 
matrix. The characteristic polynomial of the matrix (1) is 


1 

dll X 

au 

• • * Clin 

P(X) = 

Oji 

a2j — X 

• * • Oln 


dnl 

dnl 

Onn — 


= P„ - P_iX + P_,X* + (- 1)"X“. 

As is well known, the coefficient P* can be calculated as the sum of all the 
A;-rowed principal minors of the matrix (1). Our method of computing all the 
principal minors of a matrix therefore gives us as a by-product a method of 
computing the coefficients of the characteristic polynomial. Another method 
for the determination of these coefficients has been given by Paul Horst [4]. 

We may obtain a comparison between the work of computation entailed by 
the two methods by calculating the number of multiplications and divisions 
necessary when using one or the other method. If our recurrence formulae (6) 
are used, two multiplications and one division are necessary for computing a 
2-rowed minor, and 4 multiplications and one division for every minor with 3 
or more rows. Consequently the total number of multiplications and divisions 
will be: 



= 6.2" - (n* + 4n -f 6). 


On using Horst’s method, the number of necessary multiplications and divi- 
sions will be found to be 

= (in ~ l)n* + in* -|- i(n — l)(n -(- 2) 

H„ = i(n — l)(n* -H n -b 2) n ev«i, 

Hn = i(n - l)(n* -b n* -b n -b 2) 


n odd. 
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When n = 2, 3, • • • 12, Sn and Hn acquire the following values: 


n 

-s. 

Hn 

2 

3 

. 6 

3 

14 

41 

4 

43 

105 

5 

110 

314 

G 

255 

560 

7 

558 

1203 

8 

1179 

1827 

9 

2438 

3284 

10 

4975 

4554 

11 

10070 

7325 

12 

20283 

9581 


We see that our method of computing the coefficients of the characteristic 
polynomial involves less calculation Avhen w < 10, while Horst ^s method is su- 
perior when n ^ 10. 

If our purpose is to find the characteristic roots of the matrix we can do this 
with less amount of computation without first finding the coefficients of the char- 
acteristic polynomial. See Aitken, [2]. 

3. Applications in confluence analysis. The confluence analysis of Frisch is 
set forth in his book: ^^Statistical Confluence Analysis by Means of Complete 
Regression Systems, [3]. 

The main method of this book is the ‘‘bunch analysis,’^ which includes the 
computation of the adjoints of the correlation matrices of all sets of variates 
contained in the total set. In section 1, Frisch has described a preliminary 
analysis by means of scatterances. The scatterances are the principal minors 
of the correlation matrix of the total set of variates. If we carry through such 
an analysis, the recurrence formulae of section 1 of this paper will give a rapid 
method for the calculation of all the scatterances. 

Another application of the computation of all the scatterances arises in the 
determination of the correct time lags between Variates in a structural equation. 
This problem will be treated in a paper on confluence analysis which will appear 
in the near future. 
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NOTES 

This section is devoted to brief research and expository articles, notes on methodology 
and other short items. 


A CRITERION FOR TESTING THE HYPOTHESIS THAT TWO 
SAMPLES ARE FROM THE SAME POPULATION 

By W. J. Dixon 

1. Introduction. The purpose of this paper is to consider a criterion for 
testing the hypothesis that two samples have been drawn from populations with 
the same distribution function, assuming only that the cumulative distribution 
function common to the two populations is continuous. Let the two samples, 
On and On, , be of size n and m respectively. We may assume n < m without 
loss of generality. Suppose the elements ui , • • » , Wn of On are arranged in order 
from the smallest to the largest, that is, t/i < tt* < • • • < Un • These may be 
represented as points along a line. The elemente of Om represented as points 
on the same line are then divided into (n +• 1) groups by the first sample, On. 
Let mi be the number of points having a value less than Ui , m,- the number 
lying between w* and Ui^i , (t = 1,2, • • • , n) and Wn+i the number greater than 

The criterion here proposed is' 

- 

m) ’ 


Un , (mn+1 == m - mi — W2 - 

(1) C* 


• • • - mn). 

sC-fi 


* A similar criterion 



for two samples of the same size was investigated (unpublished) by A. M. Mood. He 
found the mean and variance to be 


Em 


2n+ 1 
8n ' 


<r5i 


- 


8(n - l)(2n -f 1) 

48n* 


It can be seen that this is the sum of the squares of the differences between the ordinates 
of the two cumulative sample distributions calculated at the jumps of the first sample 
distribution. 
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2. The mean and variance of C*. The only case of continuous cumulative 
distribution functions F{x) of any interest in statistics is that in which dF(x) — 
f{x) dx, where /(x) is a probability density function. Let us write: 

Pi = f fix) dx, Pt-f fix) dx, . . • , Pn+i = f fix) dx, 

•t-oo Jui 

where of course p„+i =1 — pi — p*— ••• — p„. 

Now, the joint distribution law of the p, is 

(2) P(pi , • • • , p«) = n! dpi • • ■ dpn 
and the conditional distribution of the mi given the pi is 

(3) P(mi, ...,mn+i|pi, •••,p») = 


mi! . • • mn+i 

Therefore the joint probability law of the m< and p, is 

7l\m\ m, iHQ >v™n+ 


mi m2 

,Pl P2 


Pn + l • 


(4) 


Pim,p) = 


p”' p?’ • • ' Pn+V dpi • • • dp« . 


mi! m„+i!' 

Let <pie) - <pi9i, dn+i) = P fexp 2 ( -4-i " “‘^1 ! t^en 

L <-i \n + 1 rn/ J 


(5) 

(6) 
and 

(7) 


i-l 

•’(*) - 2. / exp [I; ». - 2)] ««, p). 


where Sm denotes the usual multinomial summation over all integral values of 
Mi > 0 for which = m and the integration is over the generalized tetra- 
hedron defined by Pi > 0 and pi + ps + • • • + pn^i <1. If we perform 
the summation first, we obtain 


n+l 0. 

2 r j 

(8) <p($) = nle*"' " J (pic pn+ic ^ dpi • • • dpn . 

Differentiating twice with respect to Bi and setting the ^’s equal to zero, we get 




d0i 


iL. ■ ”' / + (s " S^i) + “sr ■ 

If we now integrate and sum from one to n + 1, we find 
(9) 


vifnts _ »(n + OT + 1) 
“m(n+l)(n + 2)‘ 
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Perfonning the operations indicated in (6), we (sSaism J?{(C*)’} from which' we 
subtract [E^)]* and have as the variance of C*, 

> 4n(OT — l)(fn "i" n •4’ l)(*a H- n + 2) __ 




m*(n 4- 2)*(n + 3)(» + 4) 


3. Significance values of C\ If we let (7* be defined as the smallest value 
of C* for which P(C* > Ci) < a then we can compute the value of Ci fairly 


TABLE I 

Values C\ . a =» 0.01, 0.05, 0.10 


\” 
m \ 

2 

3 

4 

6 

6 

7 

8 

9 10 

4 


— 

.800 






5 



.800 

.833 







.750 

.800 

.833 

.857 




6 


.750 

.800 

.833 

.857 






.750 

.800 

.556 

.413 








.833 

.867 

.875 



7 


.750 

.800 

.588 

.612 

.467 




.667 

.750 

.555 

.425 

.449 

.426 






.800 

.833 

.867 

.656 

.670 


8 


.750 

.800 

.594 

.482 

.469 

.389 



.667 

.531 

.425 

.413 

.367 

.376 

.358 





.800 

.833 

.660 

.677 

.543 

.554 

9 


.750 

.602 

.448 

.413 

.431 

.395 

.381 


.667 

.552 

.454 

.389 

.363 

.356 

.321 

.307 




.800 

.833 

.677 

.556 

.549 

.480 .449 

10 

.667 

.750 

.480 

.493 

.437 

.415 

.349 

.340 .349 


.487 

.430 

.380 

.373 

.367 

.316 

.309 

.280 

readily for small values of 

m and 

n. The values of C* for tn, n 

< 10 are given 


in Table I for « 

continuous the probaUlities P(C^ ^ Cl) will, in general, be less than a. 
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It will be Been that if m and n increase indefinitely in the ratio n/m ■» y, 
then nC* converges stochastically to y + 1 whereas nC* ranges from 0 to 
n*/{n +1) which indicates a tail to the right. This suggests that for larger 
values of m and n, it is reasonable to try to fit the distribution of nC* by the 
method of moments using a distribution of the form 

( 11 ) 

2* r(iv) 

which has 


E{x^) = 





Setting = nC^, we see that we can consider nkC^ distributed as x* with v 
degrees of freedom. Of course, v is not necessarily an integer, but tables 
may be used for approximate values of the probability that nkC^ will exceed 
certain values,* or the values of nfcC* that will be exceeded a certain per cent 
of the time.* More exact values of these probabilities that nfcC* will exceed 
a certain value may be found from a table of the incomplete Gamma function.* 
To calculate k and v directly, the following formulas obtained by equating 
the mean and variance of (11) to the mean and variance of nC* may be used: 


(12) k = am{n + 2)/n, v = an(n + m + l)/(n + 1), 


where 

= „ - + 3)(n + 4^ 

2(w — i)(w 4" ^ "t" 2)(?i 4* i) 

If the fitted curve (11) is used to obtain significance values of nC*, there is a 
tendency toward rejecting slightly over 100a%, especially for small values of 
m and n. The error is probably due to fitting a curve having an infinite range. 
The discrepancy decreases as m and n increase. 

The goodness of fit at the 0.01, 0.05 and 0.10 significance levels was tested 
for two cases. 

Case 1. n = 9, w = 10; nifc = y = V*- 

The exact distribution in the region under consideration is the following: 


Cq 

... .26 

.28 

.30 

.82 

.84 

.36 

.40 

.42 

.44 

.48 ... 

P(C* ^ Cl) 

... .121 

.090 

.082 

.072 

.037 

.033 

.025 

.026 

.016 

.007 ... 


The values of C« from the fitted curve are C.oi = 0.422, C.o6 = 0.323 and 
C?io = 0.277. The double rule indicates the divisions (from the fitted curve) 
for a ■= 0.01, 0.05 and 0.10. 


* Karl Pearson, Tables for Statisticians and Biometricians ^ part 1, Table XII. 

• R. A, Fisher, Statistical Methods for Research WorkerSf Table III. 

^ Tables of the Incomplete Oamma Function^ Biometrika Office, London. 
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Ca»e 2. n » 12 , m 09 12 ; rJe >9 65.068, v b 8.938. 

The important part of the exact distribution for our purposes is: 


Cl 

■B 

.229 .243 .256 

.270 ... .826 

.340 .854 .881 ... 

P(C* ^ CJ) 

... .120 

.100 .078 .057 

.046 ... .017 

.014 .011 .000 ... 


The values of C\, from the fitted curve are C .01 = 0.3316, C.oi =» 0.2587 and 
Cfio = 0.2244. 


4. Examples. 1. Two samples of ten members each are drawn and it is 
desired to test, using a rejection region of size a, the hypothesis that these two 
samples could have originated from the same population about which nothing 
is assiuned except that it is continuous. The first sample was found to divide 
the second sample into the following groups: 0, 0, 0, 3, 0, 4, 0, 0, 2, 1, 0. 

C* = (A - A)’ + (* - A)* + (A - A)* + (A - A)* + 7(A)* “ -209 

which we see from Table I is not a significant value even for a » Q.IO since 
C*,o = 0.269. 

2. A sample of 15 divides a second of 25 into the following 16 groups: 0, 1, 
0, 0, 5, 4, 1, 3, 9, 0, 0, 1, 0, 1, 0, 0. 

C* = (A ~ A)* + (A ~ A)* + (A ~ A)* + (A ~ A)* + 4(A “ A)* + 8(A)* 

nC* = 2.302 k «= 7.511 v = 10.19 
nk(f = 17.296 

which gives a significant value for « = 0.10 but not for a = 0.06, since nfcCfio 
16.233, n*C.M = 18.568. Actually P(n/kC* > 17.29) = .077. 


5. Remarks. If we set W equal to the number of m,- which are aero and 
V — n + 1 — W then V is the number of non-zero m,- ; further, 2V ^ f/ where 
U is the total number of runs, the criterion proposed in the paper of Wald 
and Wolfowitz in the present iasue of the Annals of Mathematical Statistics. 
Now, 

»+i 

(13) W= lim 
so that, setting 

(14) ♦ - E./ exp [g - ^)] g xrP(«,p), 


analogous to (7), we have 

E(WC*) 
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from which we can find 


_ 2n(l — m) 
prc*<^F»c‘ = ^(n + 2)r»r+ir) 

and 


* ~ - (w + 3)(n + 4)( m + n - 1) 

Pr/c’ — Prc* + n + i){m + n + 2) ' 

If n/m = 7 (a fixed constant) and n is large 

s n 

P = • 

n + m 

will be near 1 when n is much larger than m. This corresponds, in com- 
puting C’, to dividing the smaller sample into subgroups by the larger. In 
this case U and C’ give essentially the same information. When m and n are 
more nearly equal the two criteria are quite different. For n > m, C* has 
fewer possible values than for n < m, and is therefore a more sensitive test 
when n < tn. 

While it is doubtful that thus test is biased for large samples, this question 
will not be considered in the present note. 

Princeton University, 

Princeton, N. J. 


SIGNIFICANCE TEST FOR SPHERICITY OF A NORMAL n-VARIATE 

DISTRIBUTION 

By John W. Mauchly 

1. Introduction. This note is concerned with testing the hypothesis that a 
sample from a normal n-variate population is in fact from a population for 
which the variances are all equal and the correlations are all zero. A popula- 
tion having this symmetry will be called “spherical.” Under a linear orthogonal 
transformation of variates, a spherical population remams spherical, and conse- 
quently the features of a sample which furnish information relevant to this 
hypothesis must be invariant under such transformations. 

A situation for which this test is indicated arises when the sample consists 
of N n-dimensional vectors, for which the variates are the n components along 
coordinate axes known to be mutually perpendicular, but having an orientation 
which is, a priori at least, quite arbitrary. A specific application for two 
dimensions, treated elsewhere {1], may be mentioned. Each of N days fur- 
nishes a sine and a cosine Fourier coefficient for a given periodicity, and these, 
when plotted as ordinate and abcissa, yield a somewhat elliptical cloud of N 
points. The sine and cosine functions are orthogonal, and their variances have 
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equal expectancies for a random series. The arbitraiy nature of the oriefith^Ubn 
of axes appears here as the arbitrary choice of phase, or origin of time. Of the 
five ellipses studied, three could easily have come from circular populations 
(random), and two showed highly significant ellipticity. 

2. Ukelihood ratio criterion for sphericity. The method of Neyman and 
Pearson [2] will be used to derive a test criterion which seems entirely suitable. 
Let 12 be the class of all normal n-variate populations, and let w be the subclass 
of all normal n-variate populations satisfying the hypothesis of “sphericity.” 
The likelihood ratio criterion is obtained by taking the ratio of the maximum 
of the likelihood for variation of all population parameters specifying w, to the 
maximum of the likelihood for variation of all population parameters speci- 
fying 12. That is, 

\ _P(wmax) 

* PiQmaxY 

For the set 12, the probability law for a single observation of the n variates 
be written : 

(2) P^K\ Oii r‘ e-^h (i, J = 1, 2 . . . n), 

where is an element of the matrix H a,-, 1|“‘, the a,y being variances and 
covariances, a, is the mean value of the variate in the population, and if is a 
constant the value of which does not concern us here. Then a sample of N 
from 12 has the probability, 

(3) P = K"\ o„- 
Letting 

S S' 

( 4 ) = M'Jti and (3^»a “■ "" ^;) 8ij f 

o»«l tt**! 

differentiating the logarithm of P with respect to the parameters Oi and a<y , 
and setting these derivatives equal to zero, the maximum likelihood estimates, 

(5) d, ™ ] Aij * Bij , 

are obtauied. Substituting these values in equation (3) we find that the maxi- 
mum value of the likelihood is 

(6) P(0 max) - K’’ \ bu 

The derivation of P((i) max) proceeds upon similar lines, but is simpler, for 
the probability law for the set w is obtained from (3) by setting 

(7) Cij *= c6<,' , 
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where c is any positive constant, and 5,-, = 0 if i j and 1 if i = j. The result 
is found to be 

(8) P(« max) = 
where So is defined by 

n 

(9) nso-^Sii, 


The likelihood ratio criterion is therefore 



It will be convenient to designate the JVth root of this statistic as Lsn , where 
the second subscript indicates the niimb(‘r of x ariates: 

(11) L.n S '■■'’l’*- 

So 


3. The moments of the distribution of L,n when the population is spherical. 

The distribution of L^n cannot be easily obtained in exi)licit form for a general n, 
but the moments of L»n when the hypothesis tested is true are easily found. 

Note first that L,« may be resolved into two factors which are, when the 
population is spherical, statistically independent : 


( 12 ) 


Lsn 


(SiS2«3«4 • • • 


So" 


The first factor is just the one appropriate for tt*sting the eciuality of tlie v 
variances when the orientation of t he coordinate* ax(‘s is fixed in advance, while 
the second factor is the square root of the determinant of correlation coefficients. 
The moments of the distributions of these two statistics are known [3], and 
since the two are independent (for zero correlation in the jxipulation), wc may 
write: 


(18) 




where A and B are used to indicate the two factors, and Mh indicates the Ath 
moment. The moments are given by 


(14) 


M»(L.„) 


ft — i h) i„h r^(n(^ ~_iy 

M I WN - J • 


4. Sifnificanc* test for n = 2. For w = 1, Af»(L,i) = 1 for any h, as it 
should, since L,i is then identically 1 , and the concept of aphericitj’’ is meaning- 
less. For n — 2, the expression (14) reduces to, 


( 15 ) 


M(T \ = ~JL+ ^ - 2 

f(iV _ 1 -h A)f(i\r“^ 2) W-2 + h 
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and the distribution is thus found to be 

(16) D(L.i) = (N - 2)L^2-UL., . 

Thus for n = 2, the significance of the value of La obtained from a given sample 
of N points in a plane is simply 

(17) P(L.2 < LU) = L.'r*. 

These results for n = 2 were obtained by another method in [1]. 


6* Significance test for n = 3. For n == 3 and higher values of n, no simple 
expression for the distribution seems obtainable. In this case it appears reason- 
able to fit a Pearson curve of the type, 

(18) y = Kx’^'d - xy-\ 


by adjusting p and q so as to obtain agreement with the first two moments of 
the actual dislribution. The calculations were carried out for LJ* rather than 
La itself, to simplify the moment expimsions. The first moment of LU is the 
second moment of La , and is given as a function of N by the equation, 


(19) 


rv^ _ - 6K3N - 9) 

{SN-2)(3N-iy 


Recurrence relations, similar to those noted by Lengyel [4] in carrying out a 
similar task, hold for the moments of Liz I hence, 


( 20 ) 


H2{N) = + 2 ). 


Explicit solution of the equations for p and q in terms of N is possible: 


( 21 ) 

( 22 ) 


m + 5)iN^ 2KN - 3) 
2i9m - 82V - 15) 

2(9A'’ - 13) (92V + 5) 
9(92V2“- 82V -l5) ’ 


For values of 2V > 30, acceptable approximations to p and q are obtained by 
carrying out the division indicated in (21) and (22): 


(23) 

(24) 


p = |(iV - 4) + 2/9 + 70/81(iV + 1) . . . , 
9(32V - 2)* ■ ■ ■ • 


The values of p and q arc given in Table I so that those desiring other than 
the standard significance levels may readily enter the Pearson tables. 

For 2V a multiple of 4 from 8 to 48, and a multiple of 10 from 50 to 100, the 
significance levels were taken from the Incomplete Beta-Function Tables, umg 
adequate interpolation. The final Table I was then prepared by filling in the 
skeleton table by interpolation with respect to 2V. 

From the results of Wilks [5] it follows that —22V log< L,* is, for large 2V, 
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TABLE I 

S%, i%, and 0,1% levels of significance for the S-dimensional sphericity criterion, 
Lti = and the values of p and qfor the Pearson Type I curves used in 

calculating these levels 


N 

6% 

1% 

0.1% 

V 

9 

8 

0.172 

0.083 

0.030 

2.3239 

2.0312 

10 

.278 

.165 

.080 

3.3044 

2.0194 

12 

.366 

.243 

.139 

4.2911 

2.0131 

14 

.436 

.312 

.197 

5.2816 

2.0095 

16 

.494 

.372 

.252 

6.2744 

2.0072 

18 

.541 

.423 

.301 

7.2688 

2.0057 

20 

.580 

.466 

.346 

8.2642 

2.0046 

22 

.614 

.504 

.386 

9.2605 

2.0038 

24 

.642 

.538 

.422 

10.2574 

2.0032 

26 

.667 

.567 

.454 

11.2548 

.2.0027 

28 

.689 

.593 

.483 

12.2526 

2.0023 

30 

.708 

.616 

.510 

13.2506 

2.0020 

32 

.724 

.637 

.534 

14.2488 

2.0018 

34 

.739 

.655 

.555 

15.2473 

2.0016 

36 

.753 

.672 

.575 

16.2458 

2.0014 

38 

.765 

.687 

.594 

17.2447 

2.0012 

40 

.776 

.701 

.610 

18.2435 

2.0011 

42 

.786 

.714 

.626 

19.2425 

2.0010 

44 

.795 

.726 

.640 

20.2416 

2.0009 

46 

.804 

.736 

.653 

21.2408 

2.0008 

48 

.811 

.746 

.665 

22.2400 

2.0008 

50 

.819 

.756 

677 

23.2394 

2.0007 

55 

.834 

.776 

.703 

* 

a 

60 

.848 

.793 

.725 

28.2365 

2.0005 

65 

.859 

.808 

.744 

a 

a 

70 

.869 

.821 

.760 

33.2345 

2.0004 

75 

.877 

.832 

.775 

* 

a 

80 

.885 

.842 

.788 

38.2328 

2.0003 

85 

.891 

.851 

.799 

a 

a 

90 

.897 

.859 

.809 

43.2317 

2.0002 

95 

.902 

.866 

.819 

a 

a 

100 

.907 

.872 

.827 

48.2308 

2.0002 


*No values for p and q were calculated for these values of N ; the levels were obtained 
by interpolation (see text). 


distributed approxiniately like x* with n(n — l)/2 degrees of freedom. How- 
ever, equation (24) above suggests that for large N one may get a veiy good 
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approximation (for n » 3) by setting q » 2; the signifioanoe test for n » 3 
then becomes, 

(25) p(L^ < l:») - iL:r*[(Ar - 2 ) - (at - i)L:i]. 

Probably similar approximations can be found for other values of n. It is a 
pleasure to acknowledge the helpful comments and advice which I received 
from Mr. A. M. Mood of Princeton. Recognition is also due Mr. Wallace 
Brcy, a student assistmit under the National Youth Administration, who aided 
in the computations. 
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A SIMPLE SAMPLING EXPERIMENT ON CONFIDBNCE INTERVALS 

By S. Kullback and A. Frankzl 

1. Introduction. In order to illustrate some of the notions of the theory of 
confidence or fiducial limits in connection with a course in Statistical Inference 
at the George Washington University, we had the class cany out certain simple 
experiments, following a suggestion in one of Neym^’s papers on Statistical 
Estimation [1]. In the belief that the experimental data may be of interest 
to others, we present the results herein. 

2. The problem. We consider the problem of estimating the range 9 of a 
rectangular population defined by pix, 9) dx » dx/0, 0 ^ x ^ 9 and in par- 
ticular, for simplicity, we limit ourselves to samples of two and four. We 
consider three possible approaches to the problem, viz., by using (a) the sample 
range (b) the sample average or total (c) the larger (largest) sample value. 
Ivet us consider each in turn. 

(a) Sample range. Wilks [2] has shown that for samples of n and confidence 
coefficient 1 — a, the confidence or fiducial limits for the population range 9 
are given by r and r/^a , where r is the sample range and is determined by 

(1) - (n - l)^a] * a- 

For n = 2, a = 0.19 and n « 4, a = 0.1792, (1) yields « 0.1 and = 0.4 
respectively. Accordingly, for samples of two with confidence coefficient 
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1 — a = 0.81, and for samples of four with confidence coefficient 1 — a ■> 
0.8208, the confidence interval is respectively given by 

(2) (r, lOr) and (r, 2.5r). 

The length, Xr , of the confidence interval is respectively 9r and 1.5r. Using 
the distribution of r, n(n — l)(tf — we have for samples of two: 

E(\r) = 3®, ffx, = 2.1213®, and for samples of four: E{\r) »= 0.9®, <rx, » 0.3®. 

(6) Sample total. Following Neyman (1, p. 357] let us denote by A{0) the 
region defined by 

(3) ® — •<X^2:j-|-ij^®-)-A 

where ® is the population range, ii and x* the sample values of the sample E» 
and A is selected so as to have P\Et « .4(®) | ®) = 1 — a. It is readily found 
that P\E 2 « >l(®) I ®) = [®* — (® — A)*]/®* = 1 — a from which we find that 
A = ®(] — Accordingly (3) becomes Oa'^ < Xi + xj < ®(2 — a*'*), 

yielding the confidence limits (xi + xj)/(2 — a'^), {xi + For the 

confidence coefficient 1 — a = 0.81 the confidence interval is given by 

(4) [0.6394(xi + x^), 2.2941 (xi + x^)]. 

The length of the confidence interval is given by Xr = 1.6547(a:i + xa) so that 
E{\t) ^ 1.6547^, <rxr = 0.6756^. 

Let us denote by A^{6) the region defined by 

(5) 26 — A ^ Xj “I" Xa “t" Xb 4 " X 4 ^ 26 -j- 

where 6 is the population range, Xj , xa , Xs , X 4 the sample values of the sample 
Ea and A is selected so as to have P\Ea€ A' {6) | ^) = 1 -r a. Using the known 
distribution of the sample average [3] and 1 — a = 0.8208, it is readily found 
that 

- 8(^)‘ + = 0.8208 

from which we find that A = 0.788®. Accordingly, (5) becomes 1.212® < 
xi + X 2 + Xs + *4 < 2.788®, yielding the confidence interval 

(6) [0.3587(xi + Xj + xa + xO, 0.8251(xi + Xj + Xa + Xa)]. 

The length of the confidence interval is given by Xr = 0.4664(xi + xj + Xa + Xa) 
so that E(\r) — 0.9328® and <rxr = 0.2679®. 

(c) Larger (largest) sample value. Again following Neyman [1, p. 359] let us 
denote by Ai(®) the region defined by 

(7) g0< L < e 

where ® is the population range, L the larger of the two sample values Xi and x* 
andg,anumber between zero and unity, to bedetermined byP{J?a cAi(®) 1 ®| *= 
1 — a. It is readily found that P{Ei e Aj(®) | ®) = (®* — g*®*)/®* = 1 — o, 
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from which we jSnd that q » Accordingly, (7) becomes fla'* ^ L ^ 9 
yielding the confidence limits L, Lla'*. For the confidence coeffici^t 1 — a » 
0^1 the confidence interval is given by 

(8) (L, 2.29411). 


TABLE I 


No. of cases of 
coverage per 
set of 100 
samples 

Frequency 

Range 

Sum 

Larger (Largest) 


Samples 

Samples 

Samples 

Samples 

Samples 

Samples 


of two 

of four 

of two 

of four 

of two 

of four 

69 





1 


70 







71 





1 


72 







73 






1 

74 


1 



1 


76 







76 

4 


3 


4 

1 

77 

2 


6 

1 

2 


78 

3 


6 


3 

1 

79 

9 

2 

4 

2 

3 


80 

3 

1 

6 


4 


81 

2 

2 

1 


3 


82 

2 

1 

6 

1 

2 

5 

83 

3 

3 


1 

5 

3 

84 

3 

2 

i ^ 

'l 

4 

1 

85 

3 



3 

2 


86 

2 

2 


2 

2 

1 

87 

1 

1 

2 

1 


1 

88 



1 

2 

1 

1 

89 

1 


1 

1 



90 







91 

1 







39 

15 

39 

15 

39 

15 

Average — 

81.1 

82.1 

80.2 

84.2 

80.2 

82.1 


The length of the confidence interval is given by Xi. == 1.2941L so that using 
the distribution of L, nL*~^ dL, we have E(Xt) = 0.86279 and (tx,, = 0.30509. 
Incidentally, since L S xt + Xt we have 1.2941L < 1.6547(xi + a:*) so that 
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in every case, for samples of two, the confidence interval of procedure (c) is 
shorter than the confidence interval of procedure (b). 

For samples of four, we consider the region (7) where L is the largest of the 
sample values ii , , x» and x* of the sample E* . It is readily found that 

P{Ei € I = {$* — q*9*)/B* “ 1 — a, from which we find that q* = a. 
For a = 0.1792, q = 0.6606 so that (7) becomes 0.65069 < L < 6 yielding 
the confidence interval 

(9) (L, 1.5370L). 

The length of the confidence interval is given by \l = 0.6370/^ so that K{\i.) = 
0.42969 and = 0.08779. 


TABLE II 



! 1 

Sample 

Range 

Sum 

Larger (Larg- 
est) 


size 

Theo- 

retical 

Ob- 

aerved 

Theo- 

retical 

Ob- 

served 

Theo- 

retical 

Ob- 

served 

Confidence (\)ef!icicnt 

2 

.8100 

.8110 

.8100 

.802 

.8100 

.8020 


4 

.8208 

.8210 

.8208 

.842 

.8208 

.8210 

Average length of confi- 

2 

1 

3.0000 

2.9660 

1.6547 

1.6441 

.8627 

.8556 

dence interval per set 
of 100 samples 

4 

.9000 

.8976 

.9328 

.9296 

.4296 

.4272 

Standard deviation of av- 

2 

.2121 

.2133 

.0676 

.0581 

.0305 

.0293 

erage length of confi- 
dence interval 

4 

.0300 

.0335 

1 

.0268 

.0140 

.0088 

.0093 


3. Hie E^wiimental Data. We considered tlie rectangular population with 
9 s 1 and obtaintnl the sample values by using pairs of digits obtauuxl from 
Tippett’s random sample tables [4]. Using these observed values the confi- 
dence intervals given by (2), (4), (6), (8) and (9) were computed and the number 
of cases in which the value 9*1 was covered, noted. In all, 3900 samples 
of two were observed, subdivided into 39 sets of 100 each. The samples of 
four were obtained by combining pairs of samples of two and there wen^ studied 
1500 samples of four, subdivided into 16 sets of 100 each. Table I gives the 
observed distribution of the number of cases of coverage per set of 100 samples 
of two and of four. The length of the confidence interval obtained by each of 
the three procedures was obtained and the observed mean and standard devia- 
tion of the distribution of the average length of the confidence interval per set 
of 100 samples computed. (Since they are averages of 100 values, these ob- 
servations are practically normally distributed.) Table II summarizes these 
results. 
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THE NUMERICAL COMPUTATION OF THE PRODUCT OF CONJUGATE 
IMAGINARY GAMMA FUNCTIONS 


By a. C. C!ohsn, Jb. 

The difference equation 

/»+i _ ** + Ci® + Ot 
fa ** + 0 »Z -f- Ci 

was used by Professor Harry C. Carver [1] as the basis for graduating frequency 
distributions in a manner analogous to the use of the differ^tial equation 


I dy ^ a — X 
y dx~ bo + bix -4- b*®* 


in the Pearson system of frequency curves. In order to determine a particular 
fx by Professor Carver’s method it was necessary to perform the complete gradua- 
tion from the lower limit of the range up to and including the required /. . 
When X is large and only isolated values of /» are required it seems desirable to 
have a method for computing /, directly, and the present note seeks to accom- 
plish this purpose. 

It is well known [2] that the difference equation 
/o\ f*+i _ - (® ~ «i)(® — at) • • • (® — an) 

lx ... 

has the solution 


( 3 ) 


/. 


,« r(® ~ «i) ... r(® ~ «n) 

' r(® - /Si) ... r(® - /3*)’ 


where w* is a periodic function of x (w, = tBx+» = • • • = fc) and r(® + 1) 
for X , a positive real number may be defined in the usual manner by the second 
£uler integnd 


( 4 ) 
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which obeys the recurmon fonnula 

(5) r(a: + 1) « a!r(*). 

When X is a positive integer 

(6) r(x + 1) - *!. 

Equation (1) is seen to be a special case of (2) for n = m = 2 and accordingly, 
the solution may be written as 

where ai and at are roots of x* + Cix + c» = 0 and ft and ft are roots of 
X* + C(X + Cl = 0. The following simple examples illustrate three special 
cases of this solution. 

I. All a’s and fts are integers. 

/>^x_ 2(x« + 9x + 20) 

/. X* + 5x + 6 

has the solution 

• _ "H 4)r(x + 5) 

r(x + 2)r(x + 3) 

which, with the aid of recursion formula (5) can readily be verified by direct 
substitution. 

II. Either the a’ a and/or the fts are real irrational numbers. , 

/»fi ^ X* -h Sx -b 6 
/, X* + 3x + 1 

has the solution 

, _ r(x + 2)r(x + 3) 

“ r{x + i(3 - V6)]r[x + f(3 + y/5)] 

which, with the aid of the recursion formula (5) can also be verified by direct 
substitution. 

III. Elither the a’s and/or the fts are complex. 

X* + 8x + 17 
/, ” X* + lOx + 29 

has the solution 

f tr r(x -H 4 H- t)r(x -b 4 - 1 ) 
r(x + 6 + 2t)r(x + 6 - 2iy 

Since the recursion formula (5) is also valid for complex arguments [3], this 
solution can be verified by direct substitutdon just as in the first two cases. 
The evaluation of /« for a tdven x in cases I and II involves only computation 
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of quantities of the form r(x) which can be accompli^ed through tiie use of 
existing tables of Gamma Functions for small values of x and throui^ ai^ca* 
tion of Stirling’s formula for large values of x. Evaluation of /• in case III, 
however, involves the computation of quantities of the form r(tt + tt*)r(u — w), 
a problem which seems to have escaped previous attention. The remainder of 
the present discussion will center about this quantity. 

The Gamma Function for a real positive argument has been defined by 
equation (4), but for the present purposes, it is more expedient to use the 
definition 


( 8 ) 


r(«) 


Lim 


n!n* 

*(« + 1 ) . • • (« + n) 


which is valid for all values of the complex argument t except at the poles 
(« = — 1 ; 2 = — 2 , etc.). The above definition is equivalent to (4) at all points 
where (4) is valid [3]. 

From equation ( 8 ), it immediately follows that r(u + w)r(u — w) is a real 
number. In fact, we have 

r(u + iv)r(u - tV) = Lim . . : ,, - , 7 — ,, ft — r — ^ . m - 

n-.* (tt* + »*]((« +!)* + »*].•• [(u + n)* + »*1 


We now develop a formula applicable in evaluating this quantity when u is a 
sufficiently small positive integer. As a consequence of equation ( 8 ) it can be 
shown that [3] 


(9) r(*)r(i - *) 

Bin rz 

% 

Let z ^ tv in the above equation and we immediately obtain the result 

(10) r(«r(-») - 

When u is a positive integer, we may write 

(11) r(« + w) = (tt — 1 + w)(u — 2 + w) • • • (w)r(M;), 

(12) r(tt — ti;) = (tt — 1 — tV)(tt — 2 — iv) • •• (— ti>)r(— «)). 


The product of (11) by ( 12 ) pves 

r(tt + w)r(tt - iv) «= »*(»* + 1) •••(»* + tt - i*)r(tti)r(-w) 

which upon substitution of the value found in Equation (10) for r(tv)r(— w) 
becomes 


r-i 


(18) 


r(tt + w)r(tt — iv) 
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To obtain a result that is applicable when u is not a positive integer, we 
make use of Stirlitq^s formula for complex arguments. Lipschits {4} proves 

Log r(«) * log V2iir + (s - i) log * 

T 1 ( 

i:J(2m+l)(2w + 2)s*-H-‘ 

and that the remainder after the mth term is 

_ 1 , . - 
(2rn + 3) (2m + 4) 2 »--« ^ ^ 

where e < 1; e' < 1. J3sm+i designates the Bernoulli numbers. (Bi = J; 
= Vsi = A; etc.) We are thus able to write 

Log r(tt + ti») = log T(Re^) 

= log + {Re'* - i)(log R + tip) 


_ 4. V C 

is (2m + l)(2m + 2) JP-+‘ ’ 


where p = tan ‘ - and 12 = -s/m* + »*» 
u 

Log r(u — tv) = log r(jBe“”’) 

f,a\ = + {Re~** - i)(log R - tv) 


_ PjT** J. V S — Ij D»m+\ e 

m — 0 (2m *b l)(2m + 2) iJ*"-*-* 

Adding (15) and (16), we obtain 

Log r(M + tv)r(M — tv) = log 2t + (e*” + e~'*)R log R — log R 

+ Rup{e** - e-'*) - R{e'* + e~**) 

, y' ( ~l)**.B»w+i 

(2m + l)(2m + 2) ^ ® ^ JP«h-i 

which upon being simplified becomes 

Log r(M + tv)r(u — tv) 

(17) 

= log 2r -f ( 2 m — 1) log R — 2(^ -f u) + 2^i{R, <p), 


where 


m ^) » z 


{-!)•' Bu^i 1 


% (2m + l)(2m + 2) /2**+‘ 


cos (2m + l)ip. 


This result is somewhat similar to that obtained by Karl Pearson [5] in omi- 
nection with the evaluation of the G(„) integrals of his T}qpe IV frequency 
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curve. If B > 1, the expansion of i>{R, is ass^mptotic and the gFeatest 
numerical value that the.mth term can have is 


1 

(2m + l)(2m + 2) 

Thus according to Lipschitz results, the error committed in dropping all terms 
after the mth will not exceed: The following 

table gives an indication of the size of the error: 


Terms omitted Error committed in 

after ^(R, ip) less than 

1st ±.0833 3333/iZ 

2nd ±.0027 7777/B* 

3rd ±.0007 9365/72* 

4th ±.0005 9524/72' 

5th ±.0008 4175/72*. 

It is now obvious that formula (18) will give satisfactory results whenever 72 
is sufficiently large. The degree of accuracy required together with the value 
of 72 will determine the number of terms of ^^'(72, tp) to be computed. 

We now turn to the solution of the example under Case III and proceed to 
calculate U , /« , and fm when /o = 29. We may write 

_ oor(5 + 2i)r(5- 2t) 
f (4 + t)r(4 - *•) • 

Application of formula (13) gives 

r(5 + 2i)r(5 - 2t) = 244.043 648, 


r(4+ t)r(4 - i) = 27.202 292, 

from which, K = 260.171 676, 

f oan 171 A7R f)r'(8 i) 

/, = 260.171 676~jj^.^yjg— 

Again making use of formula (13) we have 

260.171 676. 5.6722, 


/u = 260.171 676 


r(19 + i)r(19 - i) 

r(20 + 2i)r(20 - 2t) ■ 


Since 72 is fairly large in this instance, formula (17) is used and all terms of 
4/(R, ip) after the first are dropped. This result gives 


log r(19 + »)r(19 - i) = 31.5892 259, 
log r(20 + 2f)r(20 - 2t) = 34.0812 782. 
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Accordingly, log/« =* 9.9232 071 —10 

and fu =* .8379. 

By the same method /uo is calculated and we find fun — .008723. 

As a check on the accuracy of the results obtained in the above computations, 
values of /, for x ranging from 1 to 15 were computed, using the given equation 
as a recursion formula. That is 

fi = = 17, /, - = 11.05, etc. 

These results are given in the following table, and it is to be noted that the 
values in the table for fi and fib agree with those previously computed by use 
of formulas contained in this paper. For obvious reasons, no attempt was 
made to compute the value of fm by this method. 


TABLE I 


X 

/. 

X 

/. 

X 

/(*) 

0 

29.0000 

5 

4.3375 

10 

1.6228 

1 

17.0000 

0 

3.4200 

11 

1.3961 

2 

11.0500 

7 

2.7633 

12 

1.2135 

3 

7.7142 

8 

2.2779 

13 

1.0644 

4 

5.6722 

9 

1.9092 

14 

0.9411 





15 

0.8379 
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COMPARISON OF PBARSONIAN APPROXIMATIONS WITH SXACT 
SAMPLING DISTRIBUTIONS OF MEANS AND VARIANCES 
IN SAMPLES FROM POPULATIONS COMPOSED OF 
THE SUMS OF NORMAL POPULATIONS 

By G. a. Bakes 

1. Introduction. Biological and sociological data are often “non-homoge- 
neous" and of such a nature as not to be easily separated into components. 
Non-homogeneous populations have been discussed by Karl Pearson, Charlier, 
and others. Non-normal material has been discussed by many writers. See 
for example, A. E. R. Church [1] and J. M. LeRoux [2] for a discussion of 
moments of the distributions of the means and variances for samples from 
non-normal material. 

In a previous paper [3] the author has given the distributions of the means 
and standard deviations of samples from certain non-homogeneous populations. 
The purpose of the present paper is to extend the results given in [3] and to 
compare the moment approach of the Pearsonian school with the true distri- 
butions. 


2. Moments of the distribution of means of samples of n from a nmi-homo- 
geneous population. Consider a population with distribution 


( 2 . 1 ) 


/(x) L + 

(1 -h k)'^2v L 


The first four moments of (2.1) about x = 0 are 

km 


( 2 . 2 ) 


/ 

Vi = 


/ 

Vt - 


t 

v» - 


/ 

Vi - 


H-A 

1 

1 ■+- A; 

km 
1 -|- A 

1 

1 + Jfc 


[l + k{<T* + m^)] 

[3<r* •+- tw*] 

[3 “b kiSff* "b 6wiV "b w**)]. 


The means of samples of n drawn at random from (2.1) are distributed 
according to 


(2.3) y'^d 4- k) 


n I ^ M 

1 + kY V*/ 


Jk* 


\/5cr* + W — N 


exp 


- >)" 
-b n — a 
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Denote by nip the moments of (2.3) about x = 0 and by nip the moments about 
the mean. Then in view of the relations 

»I s' _ . u! 

(n — «)! s! ^ " (n — «)! (» — r + t)l 

(2.4) Ar(r-i) = 1, Asi = 3, Ail — 6, A« = 7, 

j4si — 10, Ak “ 25, .4bs — 15, 

and similar relations, and reduction to moments about the mean we obtain 


(2.5) 


/ km ! 


mi - + k) + 6(n - Dfar* 

n*(l + ky L 


+ 


6^ 

1 + k 

k 


+ \k + {n- l)}m* + {(« - l)fc + 1 ]mV 


mg = 


\k’‘ + (3n - 4) A: + 1 )m*j 
j^l5|(2n — 1)A: + l}m<r^ — 15 {A + (2n — l)l?ra 


(1 + ky 

k 

ni(r+ ky 

+ 30 (n — 1)(1 — fc)mor* 


+ { - (n - l)fc* + 4(n - 1)* + 1 )toV 

+ {- A:* - 4(n - 1)A: + (n - !)}»»* 

+ {-*"+(- lOn + IDA:* + (lOn - ll)fc + l}m‘ 

The expressions for the first five moments agree with the results given by 
Church and Tchebycheff. 

The betas of (2.4) are 


iB, 


A:*m* 
n(i + k) 


^ + r-;-* 

*»' + ‘ + rfi 





( 2 . 6 ) 
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(2.7) iB,-3 


- 3 = 


3 + 3<r* - 6<r* 


6 j , 6 * A!*-4fc4- 

"l-j + ji!”**' (1 + k)* 






+ 1 + 


l + jfc 


n^] 


ifii vanishes if 1; == Q, m = 0, or A; = 1 and tr — 1. If A; and <r are constant 
and m approaches infinity iB] approaches (1 — kf/nk. If k and m are constant 
and 0 - approaches infinity iBi approaches zero. iBj — 3 vanishes if A; = 0, 
k = 00 , or if j» = 0 and <7=1. If A: and <7 are constant and m approaches 


TABLE I 


mt and compared for four teia of values of k, <r’, and m 


Sets of values 

ifc <r* m 

mi 

pMt 

1/2 1/4 1.1 

4.599 1.228 

n* ^ n* 

,(— f-f) 

1/3 1 3.2 

89.702 39.322 


n* n* 


-1/4 1/4 0.6 

4.640 1.744 


n* w* 

- (‘^f) 

1 4 5.6 

1,302.840 646.060 

347.204 2.406\ 

, (762.036 r-l 

1 \ n n* / 

n* n* 

”■ (■-?) 


infinity then iB* — 3 approaches (A:* — 4A: + l)/nA:. If k and m are constant 
and (7 approaches infinity then iBj — 3 approaches Z/nk. 

It is of interest to compare the higher moments of (2.3) with the higher 
moments calculated from the first four moments on the assumption of a Pearson 
curve in place of (2.3). On this assumption 

/„ os 2 to, (fiu + 7mltnt - 3m»m5) 

(2.8) pm. = — + * 

It is seen that (2.8) bears little resemblance to m. . If we ooninder the 
difference pm. — m. we see that it is of the same order in 1/n as k m* and tAte 
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numerator is of the 16th degree in k, m, and tr; a very complicated locus, mi and 
pmi are compared for certain values of the parameters of (2.1) in Table I. 

Table I shows that the coefficients of 1/n* in the expressions for mi and „mi 
differ by from two to more than 40 per cent. The coefficients of 1/n* differ 
even more. The assumption of Karl Pearson’s curves to represent the distri- 
bution of means of samples of n from non-homogeneous populations seems to 
be adequate in some cases but inadequate in others even for moderate values of 
the parameters. 


3. Moments of the distribution of variances. In [3] an estimate of n times 
the standard deviation squared is expressed as 

(3.1) TT = (n — «) ff* -f- Sffj -|- (nil + ^)*, 

n 

where a bar over a letter means an estimate of the corresponding population 
parameter and where (n — s) denotes the number drawn from the first com- 
ponent of (2.1) and s denotes the number from the second component. 

For the direct calculation of the moments of the distribution of variances 
it is easier not to use the distribution given in [3], but to proceed as follows. Put 

(n - «) ffj = y, sal = X, ^ (ihi -f nij)* = z. 


Of course, for population (2.1) <ri = 1,0-2= o-, mi = 0, mi = m. The variables, 
X, y, -z are all independent in the probability sense and their probability distri- 
butions are well known. Hence the moments of 


(3.2) 


W _ x + y + z 
n n 


can be directly calculated. 

For instance, if p = 1 then 

( 3 . 3 ) + ‘ + 

In general, of course, the moments about the mean check with the values given 
by Church. 

It is generally recommended to represent the distributions of variances of 
samples from non-normal parents by Pearson’s curves. Let us examine the 
results of this procedure in a special case. 

Suppose that the sampled population is 

(3.4) /(*) - ^ [e-^‘ + . 

The first eight moments of (3.4) which are needed in the calculation of the first 
four moments of the variances are: 
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(3.6) 


Vt « 1.7000 »» = 0 

vt » 3.8900 Vt » 204.47 

Dl s= 0 — 0 


i;4 = 28.692 Vb = 3,818.4. 





Fia. 1. Comparison of the True Distribution op the Variances of Samples of 4 
Drawn from the Non-Homogbnbous Population (3.4) with the Corresponding 

Empirical Pearson Curve 


The first four moments of the variances of samples of 4 from (3.4) are: 
tM'i = 2.918 = 4.745 

(3.6) 

tMt = 3.396 tMt = 41.62. 

Hfflice »Bi = .60 and tBi = 3.6, « = — .87 which calls for a tjrpe 1 curve. The 
equation of the curve is 


(3.7) 


0.2281 (l+j^)“"(l- 5^)"' 
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with its origin at its mode. The corresponding true distribution with the 
origin at the beginning of the range is 

yt = [.3989V* + .003550 sinh (3.4 V^) 

(3.8) 

+ .0005454 sinh (6.8 V*)]. 

Distribution (3.8) differs slightly from the corresponding result given in [3] 
because of an error in that paper. 

The two distributions are compared in Figure 1. It is seen that the two 
distributions are quite different. As the number of components of distributions 
similar to (3.8) increases, which is true as n increases, the distributions may 
be expected to become smoother and more closely representable by a single 
smooth curve. 

4. Sununary. The moments of the di.stribution of the means of samples of n 
from a non-homogeneous population composed of two normal component are 
given up to and including the fifth. This fifth moment is compared with the 
fifth moment calculated on the assvunption of Pearson’s curves to represent 
the distribution of means. The B’s of the distributions of the means are dis- 
cussed in certain limiting ca.ses. It appears that for small samples and extreme 
values of the parameters, and in some cases of moderate values of the parame- 
ters, the Pearsonian approximations give poor results. 

Some identities involving the binomial coefficients are given which permit 
the reduction of the moments of the distribution of means calculated directly 
to forms given elsewhere [1]. A method is given for the direct calculation of 
the moments of the variances of samples from a non-homogeneous population 
composed of two normal components. An indication of the closeness with 
which a Pearson curve can be made to fit the distribution of variances in small 
samples from a non-homogeneous population is given in Figure 1. 
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A LEAST SQUARES ACCUMUUTiON TSBORSM 

Bt W. E. Blbick 

The following simple least squares theorem does not seem to have been men- 
tioned in the literature, and has at least one practical application. 

If A*{x) and B*{x) are polynomials of the same degree which are least squares 
representations of the functions A(x) and B{x) re^>ectively, for the values 
* 1 » ** , , • • • » ® j» > then 

(1) t. A*(xt)B(xt) = t. A{x,)B*(x,) - £ A*(xdB*{xt). 


To prove the theorem let 


A*(x) * 


(3) B*(x) = Z hi^. 

1-0 

Then the normal equations for the determination of a< and by are 


Z OiSi+k = Z 

<-0 (-1 


x'lAixt), 


fc = 0, 1, 2, . • . , m, 


^ 6,«y+* = ^ x'lBix,), 


h — ' 0) 2| • • • j 


where ^ . 

<-i 


Hence, by (2) and (5) 


^ A*(xt)B{xt) = ^ Z Bixt) 
1-1 (-1 L<-o J 

= Z ^ !f*B(x,) 


f-1 


= Z 23 OtbiSi+t if » ^ TO, 

}->0 

= ^ A*(xt)B*ixt) if n ^ TO. 


Similarly it can be shown that 


^ A{xt)B*(xt) = ^ A*(xt)B*(xt) if to ^ n. 
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Combining (6) and ( 7 ) we have 

(8) i ^ if w-n. 

(>i (-1 (-1 

In the particular case A{x) » B{x), equation (8) gives the interesting result 

(9) i A*{xi)[A{.Xt) - A*(ar,)] » 0. 

t-i 

An obvious extension of equation (6) is 

( 10 ) i x]A*{xt)B{xt) = i x\A*{xt)B*{xt), if n ^ m + 9, 

t-i (-1 


where 9 is a positive integer. 

A practical application of (8) has been made by one large insurance corn- 
pany in the case men- 1. Suppose that A {x) represents an annual payment 
made x years ago and is an approximately linear function, and that B{x) repre- 
sents a compound interest function. Then, even if B{x) is not a linear function, 
we may write approximately 

^ A{x)B{x) s ^ A{x)Bi*{x) 

»»1 9^1 

si A{x)(&o + h.x) 

( 11 ) 

S ho i A{x) + hi i xAix) 

*»1 1 


s ho i A(i) + hi i i A{y). 

jbmI * a#—! y^x 

Thus if a year-by-year record is kept of the annual payments A(x), the sum 
i A(x), and the double sum tt A{y), and if ho and hi are tabulated func- 


s-l 


tions of p, equation ( 11 ) affords a convenient method of evaluating i A{x)B{x) 

1 

approximately. 

The author wishes to acknowledge that the case m n » 1 of equation (8) 
and the above application were brought to his attention by John K. Dyer. 
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PARABOLIC TEST FOR URKAOE 
Bt N. L. Johnson 

1. Introductbrn. In this paper a problem in testing statistical hypotheses 
which has applications in genetics will be treated from the standpoint of the 
Neyman-Pearson approach. This approach has been developed in a series of 
papers, [4], [5], [6], [7], [8], [9], [10], to which the reader is referred for definitions 
of the concepts of a simple statistical hypothesis, critical regions, power function 
of a test with respect to alternative hypotheses, and that of a test unbiased in 
the limit employed in the present paper. 

2. Statement of Problem. We shall consider M independent experiments, 
which will each yield results falling into one of the four categories described by 
the possible combinations of the 4 events o, not-a (or fi), b, and not-h (or h) 
as set up in the following table. 


1 

a 

not-a 

[ 

b 

pi 

p. 

Pi 

not-6 

Pa 

1 

P4 

1 - Pi 


P* 

1 - P, 

1 


We shall assume that the marginal probabilities are known and have values 
Pi , 1 — Pi , Pj , 1 — P 2 as shown in the table. Thus Pi = probability of 
event h happening whether event o occurs or not. It is obvious that if, further, 
the probability of a result falling in any one category or cell is fixed, then the 
other three cell probabilities will also be fixed. For if pi , Ps , Pt , p< be the 
four cell probabilities as shown in the table above, we must have 

(1) Pi + P» = Pi ; Pi + p» = P* ; p» + P4 = 1 - P* . 

Hence the values of the cell probabilities will be determined by a single parameter 
6, say, as follows 

Pi = PiP*e* Pi = Pi(l - Pje') 

p» = P*(l — Pie*) p4 = 1 — Pi — Pj + PiPiC*. 

The range of values which 8 may take for the set of admissible hypotheses is 
found from the conditions 
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(3) 0 < Pi < 1 (i » 1, 2, 3, 4) 

to be 

(4) — 00 < J < min (—log Pi , —log P») if Pi + Pj < 1 
but 

(5) log (Pr‘ + PT* - PT'PT') < 0 < min (-log Pi , -log P,) if Pi + P, > 1. 

The hypothesis tested, Ho , is that 0 = 0, i.e. that the events o and b are 
independent. It will be noticed that Ho is a simple hypothesis, since it specifies 
the probability law of the observed variables completely. In fact, if be 
the number of results out of our M experiments which are in the ith category, 
then »ti , TWj , mg , wi are our observed variables, and we have 


(6) P{mi = ml, m* = mg, mg = mj, m« = mi 1 Ho\ 


MlP^P^Pi* 

mi! mg! mg! mTI 


where pot is the value of p< when 0 = 0. 

This is the conceptual model used in testing for linkage in two pairs of genes; 
Ho corresponds to the hypothesis “there is no linkage.” Fuller explanations 
are given by Fisher [3]. It should be noted, however, that Fisher uses a pa- 
rameter 0 corresponding to in this paper. 


3. Basis of Selection of Test. The question now arises; what test shall we 
choose for the hypothesis Ho^ That is, what should the critical region w be 
to give us results as satisfactory as possible? The main aim must be to avoid 
errors, both of first and second kind, as far as possible. The first kind of error 
is subject to control, since the probability of the sample point E falling in w 
when Ho is true (which we shall denote by P{E tw\ Ho}) can be determined 
approximately, Ho being simple. The critical region w is therefore chosen, if 
possible, to give a definite level of significance to the test associated with it. 
However, there will usually be many regions which will do this, and in 
order to decide which of them give more satisfactory results we consider 
(1 — P{E tw \ H})‘, i.e. the probability of the second kind of error with respect 
to an alternative hypothesis H, the first kind of error being fixed. 

In the present case H will be determined by 0 and so we may put 
P{E « to I H j = /3(to I 0), where P(w 1 0), considered as a function of 0, will be 
the power function of the test associated with the critical region to. We want 
w to be such that $(w | 0) = a. a being the fixed level of significance while 
/0(w I 0) is as large as possible. 

It is also desirable that we should accept the hypothesis Ho more often when 
it is true than when any one of the alternative hypotheses (H) is true. Ex- 



TXST rOB UNKAOB 


in^ssed symbolically, this means tbat 

(7) 1 0) < 0iw 1 e) for aU fl 0, 

Any test satisfying the last condition is sud to be unbiased. 

If and — are each continuous and differentiable fimctions of 0, and we 
od 

consider only those alternative hypotheses specified by suitably small values 
of 0, sufficient conditions for the test to be unbiased will be 


0 , 


( 8 ) 

00] 

00]t^ 

( 9 ) 

til 

00* 


> 0 . 


According to the terminology recently adopted by Daly [1], the tests of 
which it is known only that they satisfy (8) and (0), are called locally unbiased. 
If a region w could be found such that, v being any other region for which 


( 10 ) 


0(w 1 0) = ff(v 1 0), then I3(w \ 8) > 0(v 1 0) 


for all 0^0, this would give a test which would be the best with respect to any 
alternative hypothesis. However, it has been shown by Neyman (4) that under 
certain conditions, which many probability laws satisfy, such a test will not 
exist. An attempt is therefore made to control the power of the test with 
respect to hypotheses specifying values of 0 near to 0; hoping that the powers 
of the tests so obtained with respect to the other hypotheses will behave in a 
satisfactory manner. Thus Neyman and Pearson [9] define an “unbiased test 
of Type A” as a test corresponding to a critical region w such that if t; be any 
other region in the sample space TT for which 


(11) 

0{w 1 0) = j3(p 1 0) = a 

and 


(12) 

00{w 1 0)1 _ 00{v 1 S)! _ 

dO J#-o dO J#-o 

then 


(13) 

0*0iw 1 8)1 ^ 0*0iv 1 0)1 

J#-o “ 00* J»-o' 


(14) 


In the problem which I am treating the conditions 

d0(to I g) ") 

’ 00 J»-o 


0iw 1 0 ) 


0 


implied by (11) and (12) above cannot, in general, be satisfied, since the distribu- 
tion is discontinuous, i.e. P\E tw\Ho\ is a discontinuous function of w and, in 
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fact, for a given sample size, has only a finite number of possible values, none 
of which need be equal to a. 

However, it may be possible to find a test of Ho of a type called “unbiased 
in the limit (as M increases), based on the limiting form of the multinomial 
distribution which is a continuous function of w. The definition [6] of a test 
“unbiased in the limit-’ will be taken as follows: 

Suppose we have a seqmnce (wm) of critical regions^ Wm corresponding to a 
sample of size M, such that 

(i) for any Af , if v m he any region for which 


(15) 

PiWM 

10) = 

P(Vm 1 0) 

and 




(16) 

dfiiwM 1 BY 

1 = 

dpivM 1 9)1 

dB 


d$ 

then 




(17) 

d^P(wM 1 B)' 
dB^ 

> 

a'i3(t;^j9)1 

3$^ Jtf-o 

(it) 




(18) 

lim 

AT -^00 


0) == a. 


(m) if 

(19) & = - 0) = VMe 

(20) liin — 1 = 0 

Af~»ao OU 


then the test associated with this sequence of critical regions is unbiased in the 
limit I shall call such a test a test of type A ^ . 

The reason for using ^ as the variable in condition (19) abovc^ is that, unless 
our sequence of critical regions has been very badly or unluckily chosen, we 
shall have 


lim P{wm 1 ^) = 1 


dfiiwM I e) 


{e 7 ^ 0 ) 


while, by (18), lim $(wm | 0) = a and so, in general, lim ' will not 

exist at 0 = 0. Hence we introduce i?, termed the normalized error ^ and, keeping 
d constant (and hence making B tend to zero) we form lim ~ . 

In the next section will be obtained a test of Ha which is of type A „ . 


4. DeriTation of Test. The composition of a sample of M experiments is 
uniquely determined by the numbers of results falling in the 1st, 
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m 


2nd and 3rd categories respectively. Thus any sample may be represented by a 
point E(tn) in a three^imensional sample space W(in) with coordinate axes of 
mi, ntt, and mj . It will occasionally be convenient to represent the sample 
by a point in a three-dimensional space with other axes. The following sample 
spaces will be used. 



W(m)-- 

-space with coordinate axes of mi , 


ma 



Tr(d)- 

“ “ di , 

df f 

dz 



W{x)~ 

it it it ti it ^ 

Xi , 

Xfy 

Xz 



W(n)- 

it it it ti it „ 

71% , 

rh , 

nz 


where 






(22) 


di = nii — Mpoi 


(i = 

1, 2, 3, 4) 

(23) 


Xi = (m< - Mpoi)/(Mjhi)^ 


(t = 

1, 2, 3, 4) 

(24) 


iii = rrii/M 


(i = 

1, 2, 3, 4). 


1 shall use Wu indifferently to denote “the critical region corresponding to 
sample .size Jlf” in any of the four sample spaces above; E indifferently to 
denote corresponding positions of the sample point in any of the four sample 
spaces; except in cases where confusion might arise, where I shall use w«(m), 
Wtiid), Wuin) and E{m), E(d), E{x), E(n). When necessary the size of 

sample with which a point E is associated will be denoted by a subscript; e.g. Em . 
In finding a test of type ^4* we shall need to consider the quantities 

I 0), - -■]„. «"<1 whem » . « VB. 

The probability law of the observed values mi , to* , jrs is discontinuous with 
respect to the points of the sample space W m • For if ^ be a point which 
corresponds to integral values mj , ?»* , mj of mi , m* , ms ; subject to the re- 
strictions 

(25) 0 < mj (t = 1, 2, 3) 

(26) 0 < Z m? < M 

<-i 


then 

(27) 


P{Em « ^\» = 01 


m^ mt! ms! mt\ 


where 

4-1 


( 28 ) 
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and 

(29) 


Pn 

Pm 


Piil - ft) 
(1 - ft)(l 


ft) 


0 


pn * PiPt 

poi = ft(l - ft) 
while if be not such a point 
(30) P{Em - I «) 

whatever the value of 6 may be. Now 

(3,) 

w n I ifhi I tit$ I tiii I 

where pi , pi , p* , p 4 are as defined in (2) above, and 2 denotes a finite sum- 

wja 

mation over all points E' in t» for which P\E ^ ^ E' \6] 0. Differentiating 

each side of (31) with respect to 6, we get 

dfijWM I $) 

&B 


(32) 


■1 =2 

J»-0 WM 


M\ p'SivVyV p'Si 

mi!tnt!7n8!m4! 


rmi(l - Pi - Pt) — mtPt - fft.Pi -f ffi4PiP»1 

I ' ( i - ft )( r - ft ) ■■ J 


and 


d*J^\e) 
atf* 




V Mi pti'pWipo: 

urjr ntilTnit ffitifft.! 


(33) l{”*i(i — Pi — ft) ~ w»*P» “ w*»ft + MPiPj}* 

- {fftiPiP,(l - ft - P,) + ffhPtd - ft - PiP,) 


ffiiPid -Pt- PiPt) - MPM - Pi)(l - P,)|]. 
Theorem 1. The sequence of critical regions (wm) defined by 
(34) V -f- Bu^ > A in Wm r Bu^ < A elsewhere, 


where 

(35) u 

(36) V 

(37) B 


*i(ftPi)*(l - ft - Pi) - XiPld - ft)‘ft - a:.Pi(l - ft)*ft 
{PiPid - ft)d - P,)l* 

Pid - Pi)(2P, - l){*,(ftP,)* + a:,Pi(l - ft)M 

+ P,(l - P,)(2P, - l){®,(PiP,)* + a^Pld - P.)* 
[Pift(l - P,)(l - P,) {P,(l - ft)(l - 2P,)* -I- P,(l - ft)(l - 2ft)*}l‘ 

r AfftPtd - Pi)(l - P,) 1* 

Lft(l - P0(1 - 2P,)* + p,d - ft)(l - 2ft)*J 
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388 


nti - Mpm 


and Xi 


as defined above, ie aeeoeiated with a test of Ihe hypotheaie 


{Mp^y 

Ho{6 = 0) which ie unbiased in the limit, of type A„ at level of eignifieanee a, 
provided that 


(39) 


0 < P< < 1 


and Pi and Pt are not both equal to 
In Lemma 1 of the Appendix (paragraph 9), put s <= 2, and let 

fi — individual members of the summation for /3(ww 1 0) 


/* = 


fi ~ 


« bfiiwii 1 6) 


L 

>* J»-o 


(» - 1 , 2 ) 

(see (31)) 
(see (32)) 

(see (33)). 


(40) 


From Lemma 1 we see that the regions (w) defined by 
ft > oi/i + Oj/i in w 
ft < difi + (hft elsewhere 
will maximize 2 ft with respect to all regions for which and £/j are fixed. 

w w w 

(ai and are arbitrary conatantfl depending on the fixed values of 2/i 

-i-. u; 

J2fi)‘ Hence any sequence of critical regions (w*) defined by 


(41) 


|m,(l - Pi- P») - mtPi - nnPi + 

- {toxP,P,(1 - Pi- Pt) + mtPtil - Pi- PiPt) 

+ m.Pi(l - Pt- PiPt) - MPiPt{\ - Pi)(l - P.)} 

> ai|TOj(l — Pi — Pi) — m»P* — WjPi + MPiPt] + 0* 

in Wm , will satisfy conditions (t) given above in the definition of a test of 
type . The inequality (41) may be rewritten 

jw»i(l — Pi — Pi) — miPt — mtPi + MPiPt ~ Oi|* 

(42) - [P,(l - Pi) Im, - MPiil -Pi)] 

* +Pi{l-Pt){mt-MPt{l-Pi)]]>at 


the ais being arbitrary constants. 

Also, by Theorem 1 of the Appendix, we have that, for any given • > 0 
and any region w, there is a number M, independent of w and such that for all 
M > ilf . , 
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(43) 1 $(w I 0) - /(«») I < 6 
where 

(44) I(w) = /// e-*’^idx^dx,dx, 

«(*) 

and 

3 

(45) Xo = £ a:<(l + JhipTt) + 2 a;<a:,(po.po,)*PM . 

*-i «;:S8 


We will now apply a transformation to the coordinates m\ vh which will 

(а) transform inequality (42) into a simpler form, 

(б) transform I{w) into a form to which the tables of the Normal Probability 
Integral may easily be applied for purposes of calculation. 

This transformation is 


(46) u = " 


_ a:i(PiP*)*(l - Pi - Pi) - XtP\{\ - P,)*P, - x,PiO - P,)*P, 


(47) t; = 


(48) t = 


{P,P.(1 - P,)(l - P*))» 

Pj(l - P.)(2P, - l){xx(PiP,)‘ + x,P|(l - P,)M 

+ P*(l - P*)(2Pi - l){xi(P,P*)* + a^Pid - P*)M 
[P,P,(1 - Pi)(l - P,){Pi(l - Pi)(l - 2P,)* + P*(l - P,)(l - 2Pi)*)l‘ 

(2Pi - l){xi(PiP,)‘ + XsPid - Pi)‘} 

- (2P, - l){xi(PiP,)‘ + x,P}(l - P,)M 


{Pi(l - Pi)(l - 2P,)* + P*(l - P*)(l - 2Pi)*}‘ 


This is a proper transformation, since under the conditions of the theorem 
0 < P* < 1 and Pi and P 2 are not both and the Jacobian 


(49) 


j _ diu, V, t) _ _j 

9(xi,xj,xj) 


is non-zero and of constant sign. 
Also 


(50) 


xo = 


+ t;' + i\ 


Hence 

(51) I(w) = dudv dt. 

The inequality (42) is transformed into an inequality of form B(u — a*)* 4- f ^ 
where B has the value stated above; oj and A being at present arbitrary 
constants. 

Therefore we may put o$ «= 0 and define A by the equation 
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and conclude that the sequence of critical regions (tOu) defined by the in^ 
equalities 

5w* + V > A in Wm 
(53) 

Bu + V < A elsewhere 

will satisfy conditions (f) for a test of type A * . 

From (51) and (52) 


Hwm) 


hJH 


(64) 


(2t)« 


du dv dt 




dv\ du = a. 


By Thkokkm 1 of the ap{X‘ndix, as mentioned above, we have 


(55) 

1 /3(m’« 

|0)- 

J{wm) I 

< 6 for all M > Mt 

i.e. 





(50) 

1 i3(v> 

-^10) 

— a 1 < 

€ for all ilf > Af, 

and so 





(57) 


ff(WM 

1 0) -♦ a 

as M 00 . 


Thus the sequence of critical regions {w m) satisfies the condition (it) of the 
definition of a test of type . 

If w be any region defined by inequalities on u and v only (as are the regions 
w m) then, as a special case of Theorem 1 of the Appendix, we have that for 
any € > 0 there exists a number M, such that for all Sf > M, 


(58) 


Pm{w) - ^ // dudv < e 


w(«iV) 


where P v(u>) = P{£^ « « lo ] 0} . 

By (31) and (32), noting that = y/M • — , we have 


(59) 


3|8(«) 1 1>)1 
“ at? 


ae do 

I = E/i(«, v).u.(PtP,)*(l - Pi)~*(i - P*)' 

= 2/i(«, «')•«* 


where k = (PiPi)*(l - Pi)"*(l - Pi)"‘ > 0. 

By Theorem 1 of the Appendix, as last stated above, we have 


(60) 


Mu, v) - i A«Ai;.c-*'“’+’’‘>(l + Bm) 
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where for convenience we have written Am, At; for A(j#)M, A(joti the units of m 
and V when sample size is M, and Rm for Rm(u, v) which has the property that 

(61) Z Rm(u, i;)A(iioM. A( m) 0 

W 

uniformly with respect to as Af — ♦ « . 

Now let denote that part of w where Rm > 0 and that part of w where 
Rm < 0. Then 

(62) Z ,) = Z fc . + Z k 

IC + JtTT 

Let 


(63) 


tr 2v 

= i ^ /(b. ,„-1-«->|/(b, ^|^)‘ . 


By Schwarz’s inequality 

si AuAv s 


(64) 

But 


”-J(w •+«>•) 


^ aiLaV 2 n - 

t 5 'S' “ 

AmAv j 


AmA» r> 
~^R^e 

lHuiiv 




(66) g «Vt(M, .) = 5 + 5^2, 

Now wVi(w, v) > 0 and wyi(w, v) is finite (since is a homogeneous function 

w 

of second degree in the XiH and so has a finite expectation) and is bounded 
bsM Hence 2 t)) is finite and bounded as M . Further, 

as M —* <» 


( 66 ) 


AtfAt; 

2v 2v 


ilh ' 




Hence Z is bounded as Jlf — » » . From this result, 

w+ 2v 

together with (61) and (64) it follows that (Si 0 as Jlf — » « uniformly with 
respect to w. Putting 


(67) 


tp*" 2ir 


it will follow in a similar manner that (Si 
respect to w. Hence 


0 as JIf — > 00 uniformly with 


( 68 ) 






where 8m 8m + Sm and so 8m —*0 as M —* » uniformly with respect to w. 
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Hence whatever be « > 0, there is a number M', such that for all M > M', 


dfi(w I 


-A/’/. 

_ M 2«‘ J rf 




whatever be the region w. In particular we may take w m wu , and then 
we have 

(70) I // d.*. = i *} *. - 0 


and so 


apjwM 1 d )' 

1 dd 


for all M > M[ 


tin. »%1«1 . 0. 


Hence the sequence of critical regions (wm) satisfies condition (iii) for a test 
of type . This completes the proof of Theokem 1. 

In the above theorem we have found a test which is unbiased in the limit for 
all cases except that for which P\ — P% — The following theorem derives 
the test appropriate to this special case, and it is found that in this instance the 
test takes a very simple form. 

Theorem 2. If Pi = Pi = the. sequence of critical regions (wm) defined by 


i + ^8 1 > a 

I < a 


tn Wm 
elsewhere 


M L dx- I -a 


mi - \M 


(* = 2, 3), 


is associated vnth a test of the hypothesis Ha{B = 0) of type at level of 
significance a. 

The proof of this theorem follows the same lines as that of Theorem 1 as far 
as inequality (42). On putting Pi * Pi = i in (42) we get 

(76) (— Jot* — i»ns + iM — a»)* — i(w*i + m* — iM) > 04 


(** + ~ o»y ^ Or . 
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The critical region Wm defined in the statement of the theorem is of this 
form with o» = 0 and 07 = o*. 

Hence the sequence of critical regions (wat) satisfies conditions (f) of the 
definition of a test of type . The sequence of critical regions may also be 
shown to satisfy conditions {it) and {%%%) for a test of type 4. by following the 
lines of the proof of Theorem 1 and noting that Xi + xs = 2M~*{mt + rm — ^M) 
tends to be distributed as a unit normal deviate as Af « 

On account of the shape of the critical regions in the general case, I shall for 
the remainder of this paper call the tests derived in the above theorem the 
parabolic teats for the cases considered. 


6. Application of the Parabolic Tests. For practical purposes the formulae 
derived above are inconvenient to use. I will therefore express them in terms 
of the deviations of the observed frequencies in the four cells from the frequen- 
cies “expected” when the hypothesis //o(^ = 0) is true, i.e. in terms of the 
variables dt , where 

(78) di = - Mpoi = XiiMpoi)* (i - 1, 2, 3, 4). 

The test then becomes “reject the hypothesis ffo at level of significance a if 
V + Bm* > A” where 


di (l — Pi — Pi) ~ d tPa — di^Bi 

- Pi){i '- pi)Y 


PY r +_*) + ftd - 7 l)(di + di) 

- Pi){i - p,) [pKi -p,)T2>* - d* -1- p*a -p,)"(2p; - ]?i]* 


(81) 

(82) 


e-*-dv\du=^a 
2ir ,Loo I, Ja-bu^ j 

r iifPiPtd - P i)(i - Pt) 

LPid- Pi)(i - 2P2)’ + p,(i - p,)(i - 2 Pi)* 


except when Pi = Pj = i. In the latter case reject the hypothesis Ho if 


(83) 


dt + ds ^ 

> a 


where 

The application of this last case (Pi = Pj = J) is straightforward, a may be 
found from the tables of the Normal Probability Integral, dt and dt may be 
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calculated from the data, and we may then see whether the inequality (83) is 
satisfied, and so assess our judgment of the hypothesis //o . 

TABLE I 

Significance of Symbols 

A and B are connected by the following relation: 


Table la 
a « 0.06 

P.06 - - 3.8414688 B 

Table Ib 
a ■■ 0.01 

P.oi * A — 6.6348966 B 

B 

P.06 

B 

P.01 

0 

1.6449 

0 

2.3263 

1.00 

0.322 

1.00 

0.289 

1.25 

.266 

1.25 

.231 

1.50 

.212 

1.50 

.192 

1.75 

.181 

1.76 

.166 

2.00 

.158 

2.00 

.144 

2.25 

.141 

2.26 

.128 

2.50 

.127 

2.50 

.116 

2.75 

.116 

2.75 

.105 

3.00 

.106 

3.00 

.096 

3.25 

.098 

3.25 

.089 

3.50 

.091 

3.60 

.082 

3.75 

.084 

3.75 

.077 

4.00 

.079 

4.00 

.072 

5 

.063 

5 

.058 

6 

.052 

6 

.048 

7 

.045 

7 

.041 

8 

.039 

8 

.036 

9 

.035 

9 

.032 

10 

.031 

10 

.029 

15 

.021 

15 

.020 

20 

.016 

20 

.014 

30 

.010 

30 

.009 

40 

.008 

40 

.007 

50 

.006 

50 

.006 


The general case is also straightforward, except for the determination of A 
from equation (81). To facilitate this I have constructed Tables la and Ib. 
These tables correspond respectively to significance levels .06, .01, and from 
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them the value of A corresponding to a given value of B may be calculated. 
The quantity tabled, (p), is the difference between A and a multiple' (constant 
for a given level of significance and given with the table to which it applies) of 
B. To find A, therefore, B is calculated, multiplied by the appropriate con- 
stant, and added to the quantity in the table corresponding to B. For large 
values of B (40 and over) p is small, and A may be taken equal to the constant 
multiple of B. 

In particular cases when the values of Pi and P» are substituted in the expres- 
sion for B (see Theorem 1 above) and in (79) and (80) above, these equations 
appear much less formidable. Thus in the case considered by R. A. Fisher 
[3], Pi = Pj = i and we get 

B = 

(85) 

M = -dt- dt); v = - 4(6M)~‘(2di + d, + d,) 

and the test becomes “reject the hypothesis Ht at level of significance a when 

(86) ^ = {(2di - d, - d,)' - |(2d, -I- d, -I- d,)l/{|(|Jlf)‘l > A 
where 

Example. Fisher [3] gives an example of the case Pi = Pj = J. In the 
series of experiments that he quotes the observed results fall in the four cate- 
gories respectively as follows: 

Wi = 32; mj = 904; w* = 906; mt = 1997. M = 3839. 

Hence d, = -207.9376; d, -|- d, = 370.375. From (86), « = 10863.1. B * 
37.94239. From the tables: 

at .05 level. Am = 3.8414688 X 37.94239 -|- 0.0075 = 145.7615 

at .01 level. Am = 6.6348966 X 37.94239 + 0.0065 = 261.750. 

Hence we reject the hypothesis that 0 = 0, i.c. that there is no linkage, since 
the value of 0 is well outside even the .01 level of significance. 



6. Power function of the Teste. General Case. The parabolic test as de- 
scribed above has the desirable property that of all tests (at level of significnace 
a) which are unbiased for large values of M this test will detect small variations 
in 0 most frequently. However, to get a clearer idea of the properties of this 


‘ This multiple is equal to k% where 


1 f+*« 

it-*. 




1 — a, a being the level of 


significance. 
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test we shall oaloulate, as accurately as may be practicable, the power function 
of the test. 

As a preliminary step we obtain a rough idea of the power function by making 
use of the concept of a limiting power function as stated by Neyman [6]. This 
may be defined as follows: 

Let Em' denote the eampk point eorreeponding to a aatnple of eiee M', and pwt 

( 88 ) P{EM'tw\d'\ ^ 

where O' » M'^0, w being a fixed region. Supposing d' kept fixed, let M' increase 
and let 

(89) $m(.w 1^0 * lim $M'{w 1 1>0 
if this limit exists. 

Then fijyo | &) is the limiting power function of the test associated with the critical 
region w. It will be noted that the limiting power function is a function of 
In the problem under consideration the parabolic test when the sample size 
is M is associated with the critical region Wn. Now it should be noted that 
in the definition of the limiting power function w remains fixed. Therefore 
the limiting power function of the parabolic test for sample size M is 

(90) d^iwti I ^0 = lim dn'iwM 1 1 >). 

M’—m 

The significance of the limiting power function is that for any c > 0 and for 
any d' there is a number Af,,« such that for all M > M,,i we have in our case 
(by Theorem 1 of the Appendix) 

(91) I Sh{wm I d') - 0 Jwm 1 d') I < e. 

It should be noted, however, that the limiting power curve (the graph of the 
limiting power function against 8 == dM~^ may be only a very rough approxi- 
mation to the actual power curve. Furthermore (Neyman, [6, p. 83]) we can- 
not, in general, use the limiting power function of a test to answer the question: 

“How large must we take our sample size M to detect the falsehood of the 
hypothesis Ho{8 ** 0) when actually 8 — 8', with a limiting probability of at 
least, say, 0.95?” 

For if we form a table as below 

M «>(jr) “ M^8' I d( jo) 

100 

1000 


it is possible that /3.(u) jr | d[ jo) may never attain the value 0.95. 
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Theorem 3. The limiting power function of the parabolic test is 

(92) M) = ^ jT"^ ^ 

in all cases for which 0 < P» < 1 and Pi and P 2 are not both equal to 
The proof of this theorem follows immediately from I'heorem 1 of the Ap- 
pendix by applying the transformation (46)-(48) and putting X = PiPj . 

The above remarks concerning special precautions to be taken with respect 
to the limiting power function suggest the necessity of studying the actual 
power function of the parabolic test by some other method. 

With this object in view, a study was made of the distribution of the function 
</) = e + Bu^ for finite values of M and in particular for M = 1(K) and M = 3839. 
</> is a discontinuous variate and, for any given value of Af, has definite limits 
of variation arising from the limitations on the values of the variables mi stated 
in the inequalities (25), (26) above. These limits of variation of <t> were found 
to be 

(93) - - tV) <<!>< - d 


for the ca.se Pt — Pt = i. Hence wh(!n 


M = 100, -12.25 <4> < 5486.8(5, 

M = 3839, -75.89 < <#. < 1310795.75. 


Also it was found that 
(94) fo(^|0) = 5/1 


(1 


2P,) 

■Pi)(l - P2) 


(c'-l) + 


{M \)PiP% 

(r:-P0(i->2)^ ^ 


where 6(4> | B) denotes the expected value of given the value of the parameter 
B. Thus when Pi = P 2 = i we have B == VfM and so &{<t> | 0) = \/|Af- 
Hence when 


M = 100, &{it> 1 0) = 6.12372, 


M = 3839, &{<!> 1 0) = 37.94239. 


It is thus seen that the distribution of <!> might be represented by a Type III 
curve, since the distribution of </» has a finite lower bound and a very long 
positive tail. In order to fit a Type 111 curve, we must know the second moment 
of the curve as well as its lower bound and mean. The general expression for 
the second moment about zero is too complicated to be printed and so only the 
numerical expressions obtained by giving special values to M are given below. 
These are: 


(i) ilf = 100 

g(0* 1 d) = 112.41667 + ]65.62963(c‘ - 1) + 2493.33333(c' - 1)’* 

+ 1078.00000(c' - 1)* + 4366.91667(e' - l)^ 


( 95 ) 
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(tt) M - 3830 

1 6) * 4318.79213 + 6397.29625(e* - 1) + 3684321 .24073(e* - 1)* 
+ 1636267.33256(e* - 1)* + 261530062.11111(e' - l)^ 

Using the above results Type III curves were fitted to the distribution of 0, 
and approximate values of the power functions ^{wm | ^), at level of significance 
.05, were calculated. This was obtained by evaluating > 4.06 1 and 
assuming the distribution of to be that given by the fitted curve. Then 

( 97 ) Ky^M\e) * P[4> > AmIb}. 

The values obtained for the limiting and approximate power functions are 
given in Tables I la, Ilb. Unfortunately the agreement between the two is 
not satisfactory. 

Special Case. For the cases Pi = P 2 = i {M = 100, M = 400) power 
functions were calculated on the assumption that for a given value of 0, the 
random variable 2M \d 2 + dz) is distributed normally about a mean M\e^ — 1) 
with standard deviation \/e®(2 — e^). This is approximately the case for the 
values of M considered. The approximate power functions so calculated are 
given in Tables Ilia, Illb. 


7. Parabolic Test and x" Test. It is interesting to note the close connection 
between the parabolic test and the x^ test as introduced for intuitive reasons 
and normally used in testing for linkage. The x* test consists of calculating 
the quantity 


1 


(98) 


MPiP 2(1 - Pi)(l - P 2 ) 


{(1 - Pi)(l ~ P,)m, 


- P2(l - Pi)mi - Pi(l - P2)mz + PiPtmi]^ 
and rejecting the hypothesis Ho{6 = 0) if 1 x 1 > « where 


In the. special case (Pi = Pj = i) the parabolic test and the x* test are iden- 
tical; while comparing (98) and (79) we see that in the general case 

(100) tt = X. 

Hence in the general case the criterion used in the parabolic test may be 
written 

(101) (l> = V + Bx*. 

(1) Large Samples. For large samples the first term of the expression v -|- 
Bx^ is usually of small importance, since 
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V is of form M~* X (linear function of the d/s), while 
fix* is of form Af~* X (quadratic function of the 

For such samples the x* test and parabolic test would appear to be nearly 
equivalent. 


TABLE II 


Limiting and Approximate Power Functions of Parabolic Test 

Pi = Ft » 1 
-» < e < 1.386 


Table Ila 
M = 100 


Table Ilb 
M » 3839 


Power 


Power 



Limiting 

Approximate 


Limiting 

Approximate 

-2.00 


0.90870 

-0.25 

0.99932 

0.99853 

-1.50 

0.99880 


-0.20 

0.98502 

0.97521 

-1.40 


0.77656 

-0.16 

0.87243 

0.83620 

-1.20 

0.97915 

0.69505 

-0.10 

0.64197 

0.52066 

-1.05 

0.93786 


-0.05 

0.17827 

0.19223 

-1.00 


0.58580 

0.00 

0.05000 

0.04111 

-0.90 

0.85024 


0.06 

0.17827 

0.21568 

-0.75 

0.70467 

0.42755 

0.10 

0.54197 

0.69517 

-0.60 

0.51532 


0.16 

0.87243 

0.91641 

-0.45 

0.32258 

0.21849 

0.20 

0.98502 

0.99640 

-0.30 

0.16986 

0.12504 

0.25 

0.99932 

0.99999 

-0.15 

0.07905 

0.05689 




-0.10 

0.06280 

0.04438 




-0.06 

0.05318 

0.03866 




0.00 

0.05000 

0.04069 




0.06 

0.05318 

0.05021 




0.10 

0.06280 

0.07429 




0.15 

0.07905 





0.30 

0.16986 

0.26559 




0.46 

0.32258 





0.60 

0.51532 

0.76864 




0.76 

0.70467 

0.94245 




Theorem 4. The limiting power function of the x 

test is 



(102) (8.(«)*‘|d) - 1 ^ 

V2ir 


(Wx* denotes the region defined by the inequality | x I > o)- 
This theorem may be proved by applying (46)-(48) to Qe(xi , xt , Xt) in 
Theorem 1 of the Appeni^, and noting that » x by (100). 
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We notice that 1 1 ^), for a given value of &, has the same value for all 

values of M, unlike the limiting power function /3«(wjf | of the parabolic 
test. It is this point which accounts for the seeming paradox that, despite the 
manner in which the parabolic test was defined, for all values of ^ and M 

(103) 1 1») > fiJwM 1 «>) 

as may be deduced from (92) and (102). This does not mean that for any 
given 0 and all M sufficiently large the power function of the x* test, |8m(«>x* 1 ^)» 

TABLE III 

Approximate Power Function 
Pi - = i 

<9 < 0.693 

Table Ilia. Table Illb. 


M 

- 100 

M 

= 400 

e 

Power 

e 

Power 

-0.46 

0.96288 

-0.26 

0.99424 

-0.40 

0.92161 

-0.20 

0.95482 

-0.35 

0.85072 

-0.16 

0.79787 

-0.30 

0.74351 

-0.10 

0.47734 

-0.25 

0.60197 

-0.06 

0.16378 

-0.20 

0.44054 

-0.02 

0.06810 

-0.15 

0.28380 

0.00 

0.05000 

-0.10 

0.15727 

0.02 

0.06885 

-0.06 

0.07737 

0.06 

0.17609 

0.00 

0.05000 

0.10 

0.56737 

0.05 

0.08029 

0.16 

0.90213 

0.10 

0.18177 

0.20 

0.99431 

0.15 

0.36464 

0.25 

0.99995 

0.20 

0.60278 



0.25 

0.82071 



0.30 

0.94975 



0.35 

0.99299 




is necessarily not less than the power function of the parabolic test, finiwu | d). 
For although, given any c > 0, there is a number M,,o such that if M > Af,.« 

(104) 1 0u(w^t 1 if) - 0Jw^f 1 1>) 1 < « 

and 

(106) I ffuiwjii I <1) - 0jwu 1 «») I < « 

it may be that for such values of M,,g 

(106) 0 < 0jw^f 1 - 0Jwm 1 «») < 2e. 
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The above results show, howev'^er, how close the agreement/ between the power 
functions of the two tests is for large values of M, In fact we have 

(107) lim 0 »(wm I = /3«(wx* 1 


This may be easily proved, since as M increases w m approximates to w^t . 

(2) SmaJl Samples. In order to obtain some idea of the relations between 
the two tests when M is small (i.e. less than 100), the case Pi = Pi = i, Af = 32 
was considered in some detail. 

In this case our tests at 5% level of significance are respectively 
x” test, reject if 

(108) I 2j/ - 2 I > 8.315 
parabolic test, reject if 

(109) {2y - tf - i{2y + 2 ) > 69.576 
where 

(110) y = di z = di + di. 

All samples for which the verdicts of the two above tests would not agree, 
were obtained. These were as follows: 

(o) Samples for which Ho is accepted by x'* test, rejoct(>d by parabolic test 


Probability of drawing sample of this typ/i 
when Ho is t rue is 0.00320. 


(6) Samples for which Ho is rejected by parabolic test., ac(!(^j)ted by x" test. 

jy = |0 1 2 3 5678899 Probability of drawing sample 

— - of this tyjM' when Ho is tnu' is 

9 11 13 15 1 3 5 6 7 8 9 0.00038. 

Thus the probability of the two tests giving different, verdicts when Ho is in 
fact true is only 0.00358. 

It will be noted that the above results imply that 

(111) I 0) - /3m(«;*i I 0) = 0.00320 - 0.00038 = 0.00282; 

i.e. that the true levels of significance of the two tests are not equal. This is 
to be expected, because of the discontinuity of the probability distribution of 
sample points, which makes it unlikely that the level of significance of either 
test is exactly .05. 

Similarly we can obtain values of finiwn ] B) — \ 6), the differences in 

the powers of the two teats with respect to various alternative hypotheses. 
These values were obtained for a few values of 6. 
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B 

1 B) — /3«(Wx* 1 B) 


-0.5 

0.01625 


0.0 

0.00282 

• 

0.5 

-0.00006 


These figures indicate that the parabolic test detects negative 0^a better than 
the test, but that the test detects positive S*s better than the parabolic 
test, although the advantage in this latter case is minute. 

The critical regions associated with the two tests may be represented by 
regions in the ( 2 /, z) plane. The critical region for the parabolic test will be 
defined by 

(112) {2y - zf - 1 ( 22 / + z) > v 
and that for the x^ test, , by 

(113) (2y ~ zf > / 
where v ^ v\ 

Wj^t is therefore the complement of the region lying between the lines Li , Li 
with equations 2y — z — dt\/v';wM lies outside the parabola K with equation 
(2y - zf - i(2y + z) = V. 

Since v K meets L \ , Li at points near the respective intersections of 
Li , Li with the line 2y + z = 0. See Figure 1. 

In the diagram th(^ regions Vi , Vi contain all sample points for which the 
x^ test rejects and the parabolic test accepts Ho ; Ui , Ui contain all sample 
points for which the x^ test accepts and the parabolic test rejects Ho . 

For a given value of 6 it is known that the probability distribution is approxi- 
mately such that the quantity 

^ - 1)^ {z + W(e» - l)r 

^ r\M + AW - 1) - iW'- 1) 

,{y + z + AM(e* - 1 )}^ 

am + a"W-i) 

is distributed a« x* with 2 degrees of freedom. 

The ellipses of equal density = constant have centers at points — 1], 

— iM[e — 1]) which must lie on the line 2y + z = 0. When = 0 the center 
is at the origin, and the major and minor axes of the ellipse make angles of 
approximately 99.5® and 9.5® respectively with the y-axis. For small changes 
in B the angles of inclination of the major and minor axes of the ellipse to the 
coordinate axes are not greatly changed, and we see that as the center of the 
ellipse moves along the line 2y + 2 = 0 we have 

(t) 0 increasing: center moves downwards, tending to increase P{E tUt] — 
\E t Vj} while P\E t Fi) and P{E * fM both l)ecome small. Thus $m{wm \ B) 
tends to increase quicker than I ®)- 
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(1() 0 decreasing: here we have the opposite effect and fiiiiwu 1 9) tends to 
increase slower than 0m(Wx* | 0). 

These conclusions agree qualitatively with those drawn in the case M » 32. 
(N.B. In the case M as 32 no sample points fall into the region U i because no 
points in Ui satisfy the inequalities (25), (26)). 


8. Some Geometrical Considerations. In this section we shall consider the 
manner in which the situations dealt with above may be interpreted in terms 


z 



of geometrical concepts. It will be convenient to consider as variables n< = 
nii/M. The sample space Tr(n) is then bounded by the four planes 


(115) 


n. = 0 (t = 1, 2, 3), 

2 “ 1 - 


In this space, corresponding to any admissible hypothesis Ht specifying a 
value of 0, there is a point T# with coordinates (fl"‘, tf"* , 0”') where 

«= PiPte*, 

- P,(l - P,e*), 
tf"* = P,(l ’- P/). 


( 116 ) 
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T1k*s(> are the proportions of results expecl-tKl in the first three cells, if the 
hypothesis He specifying B be true. 

Now, if lie he true, we have 

(117) F\ni — 7i[ , th = 712 , 7h = nj , n4 ~ 7i[ \ne\ — 


where c is constant for a fixed sample size ilf, and 


(118) 


2^ 

M 


2 


3 

I 

t-1 


1 - E 0" 


H(‘nc(‘ the most fre{|U(‘nt position(s) of the sample point- E will be some- 
whercMK'ar th(* point 7Vi, which I shall then»for(‘ call the center of denaUij. It 
will 1)(‘ notic(»d that, whatever be the value of 0, the j)oint To must lie on the liiH^ 

m - P 1 P 2 = ~[n2 - PlO - P2)\ = -[^3 - P2(l - Pl)l 

This line*, a segment of w'hich is the locus of the center of density for our s(‘t of 
admissibh' hy])oth(‘S('s, w ill be called tla* line of fiensity. 

In this s])ace th(‘ parabolic t<‘st corr(‘sponds to a critical region comprising the 
('xt(‘ri()r of a parabolic cylind(‘r. The (apiation of tla* boundary of this critical 
rc'gion at lev(i of signiticanc(‘ .05 was found for th(‘ case Pi = P 2 = and a 
mod(‘l made of it. Also includc'd in the model wen' the elIi})soids 

(120) xi = Km 

w h(*re K.ob is a constant so chosen that 

(12J) /'!x« > A'.osl fll - .05 

corn'sponding to 

(i) th(‘ case when //o is true 

(ii) th(* cases when 

(122) {a) pi = i\\ pi = /);{ = i.e. 0 = 0.41 

(123) (/>) pi = 3*2 ; V2 = Ps = bt; P4 = kl 0 = -0.69. 

It was found that in the case Pi — Pi = I one axis of all the xrollipsoids 
was p(*rp(‘ndicular to the plan(‘ through the line of density and th(^ axis of ru . 
The g(uierators of the boundary of th(' parabolic acceptance region are also 
perjumdicular to this plane. (By ^‘a(a*eptaiice region’^ is meant the complement 
of th(‘ critical region. Th(* acceptance n^gion may be written symbolically 
Wm .) There were further added to lhi‘ model the intersections with this plane 
of th(‘ ellipsoids at probability level .01, coiTcsponding to the thre(‘ hypotheses 
considered above {6 = 0, 0. II, —0.69) and tw^o others, viz. 

(124) Pi = p2 = p3 = P\ = U ^ == 0.92, 

(125) Pi = fi\; p2 = ps = fif ; P4 = M i-e. B = -1.39. 
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For convonienro in making the model to a simple scale (1 unit ^ 150 cms.) it 
was found nect ssary to take the sample size M as 1312.5. The model is shown 
in Figure 2. It will be seen that the ac^ccptanee region for the parabolic test 
is approximat(^ly enclosed between two parallel planes pc^'pendicular to th(i 
plane common to the liiu^ of density and the axis of ni . These two planes, in 
fact, enclose the acceptance region for the x test. The vertex of the normal 



Fio. 2 


parabolic section of the parabolic ac.ceptanee region is at a comparatively great, 
distance “below’^ the plane ni = 0. 

As an interesting digression we may use our model to cf)m})ar(' qualitativ('ly 
the parabolic test, with yet a third possible test of Ih . This tt*st is to r(\j(H*t 
Hq at level of significance .05 if 

(126) xl > Km 

and may be called the xl (■(‘si- The xo-ellipsoid shown in the mochd is the act- 
ceptance region for this test. It will be noticed that when 6 9 ^ 0 the ellipsoids 



equal draisity include sraiewhat more of the aoeeptanee regkm of ihe^ test 
than of the parabolic acceptance repon. This means that the x« test would 
detect that the hypothec Hoit p 0) is false, in these cases, less frequently thad 
would the parabolic and x* tests. We also notice that the center of density 
T$ leaves the parabolic acceptance repon before it leaves the acceptance region 
of the xo test as it moves along the line of density from the point where 6 ■■ 0, 
whether the <hreotion of motion of Tt corresponds to S increasng or decreasing. 
This also indicates that the xo test would act less efficiently than the oth^r 
two tests. 


9. Appendix. In this appendix are obtained various results which, while 
essential to the main argument, would appear as digresmons if they were inteiv 
polated as required. The numbering of equations in this appendix does not 
continue from that of the previous sections, but forms a separate group. 

Lemma. If Mm), Mm), ••• , /,(m) 6e (« + 1) funcHom of tiie k variablet 
mi, mt, ,mit which are lero except for a finite number of eets of integral values 

of mi , • • ■ , mt ; and if Wo be a region in the space of m’s such that 

» 

(1) Mm) > 23 o</<(m) in wo 

t-i 

$ 

(2) Mm) < 2 Oififm) in Os 

<>1 

Oi , Ot , • • • , Ok being arbitrary constants; then if w be any region such that 

J^fiim) « ^Mm) (* » 1, . . . , 

10 100 

Z3/o(*») < 2 /«(»«). 

10 Wo 

i “ 2/o(»») - 2/o(»») 

wo w 

* 2 Mm) - 2 /o(m) 

Wo— WWO W— WWO 

where vm denotes the common part of w and Uk) . 

Hence the region w — wvoo , consisting of those points of w which are not in 
wwo , and so not in UV) , is contained in . Similarly the r^on — wwo is 
contained in Ub . Hence, by inequalities (1), 

«> 2 I 2 <*</<(*»)}- 2 ( 2 «</<(»») 

wo— WW q ) w— wwo 

« ^ 2 12 - 2 12 


(3) 

we shall have 

(4) 

Proof. Let 
( 6 ) 


( 6 ) 

and so 
(7) 
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Since the total number of terms in each double summation is finite, we have 

(8) 5 > ^ a< {H /.(»») - £/<(»»)}. 

But 

( 9 ) Hfiim) = 52 /,(m), (t = 1 , . . . , g). 

Wq ^ 

Hence 

5 > 0, and < 2/o(»»). 

W WQ 

A lemma similar to the lemma above, where the/’s are taken to be integrable 
functions and summation over the regions w, Wo is replaced by integration over 
these regions, is given by Neyman and Pearson [9]. The proof given above 
follows the lines of the proof given in that paper. 

Theorem 1 . Suppose that, in a quadrinomiai population: 

(t) the cell probMlities are dependent on the number M of trials made, and are 
given by 

Pi = Poi + Vm 

Pt — Thi — <Pm 

(10) 

p» = Pm — 

Pi = Pm + 

where 

(11) 2pw = I^Pi = 1 

i-l <-l 

and 

(12) q>u = X(e'""* - 1) 

(«•) 

( 13 ) Xi = (mi - Mpoi)/iMpoiy (i = 1 , 2 , 3 , 4 ) 

where mi = number of results falling in i-th cell. 

{Hi) w(x), or briefly it?, is a region in the space W of Xi ^ x^j Xz ; and P m{w) 
is the integral probability law of w corresponding to the values Pi y Pi f Pz t Pi of 
the cell probabilities given in ( 2 ) above when we have M independent trials. 

Then 




( 14 ) 
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uniformly over W as M —* * , where 

3 

Q«(xi, Xt, Xz) — 2^ a:i(l + poipit) + 2pM £ a;<a:,(pwpoj')* 

1—1 

(15) — 2Xiy{xi(pM “ ptnpoi) — Xf(po2 + pofPo/) 

4 

— MP08 + PwP 04^) j + ]C Poi» 

4-1 

This theorem may be proved by the same method as that used by F. N. 
David [2] in proving the generalized theorem of Laplace. 

1 would like to thank Professor Neyman for his invaluable suggestions and 
advice in the preparation of this paper. 

REFERENCES 

[1] Joseph F. Daly, Annals of Math, SlaLt Vol. 11 (1940), p. 1. 

[2] F. N. David, Stai. Res. Mem., Vol. 2 (1938), p. 69. 

[31 R. A. Fisher, Statistical Methods for Research Workers, 6th. ed. (1936), Oliver and 
Boyd, London. 

[41 J. Neyman, Phil Trans., A., Vol. 236 (1937), p. 333. 

[6] J. Neyman, Skand. Actmr. Tidskr., Vol. 20 (1937), p. 149. 

[6] J. Neyman, Ann. Math. Stat., Vol, 9 (1938), p. 69. 

[7] J. Neyman and E. S. Pearson, Biometrika, Vol. 20A (1928), p. 223. 

[81 J. Neyman and E. 8. Pearson, Phil Trans., A, Vol. 231 (1933), p. 289. 

[91 J. Neyman and E. S. Pearson, Stat. Res. Mem., Vol, 1 (1936), p. 1. 

[10] J. Neyman and E. 8. Pearson, Stat. Res. Mem., Vol. 2 (1938), p. 26. 


Department of Statistics, 
University College, 
London, England 



REDUCTION OF A CERTAIN CLASS OF COMPOSITE 
STATISTICAL HYPOTHESES 

By George W. Brown 

1. Introduction. A situation frequently met in sampling theory is the fol- 
lowing: X has distribution /(x, ^), where B is an unknown parameter, and for 
samples (xi , • • • , Xn) there exists in the sample space En a family of (n — 1 )- 
dimensional manifolds upon each of which the distribution is independent of 
B\ in addition there is a residual one-dimensional manifold available for estimat- 
ing B, For example, suppose there exists a sufficient statistic T for B, then on 
the manifolds T = To there is defined an induced distribution which is inde- 
pendent of the parameter. 

A similar situation is observed when B is a ‘Mocation^* or ‘^scale*^ parameter. 
Let X have the distribution /(x — a) for some a, then the set (X 2 — Xi,X 3 — 
Xi j • • • y Xn — Xi), or any equivalent set, such as (X 2 — , x,, — have a 

joint distribution independent of a, and there is a residual distribution corre- 
sponding to each particular configuration (x 2 — Xi , • • • , Xn — Xi). Fisher 
[1] and Pitman [6] have examined the residual distributions in connection with 
the problem of estimating scale and location parameters. In this paper we 
shall be concerned primarily, not with the residual distribution, but with the 
remainder of the sample information, corresponding to the (n — l)-dimen8ional 
distribution which is independent of the parameter. It is found, in a rather 
broad class of distributions, that the part of the sample not used for estimation 
determineSy except for the parameter valucy the original functional form of the 
distribution of x. 

This paper is devoted mainly to a study of particular classes of distributions 
having the property mentioned above. We consider also the theoretical appli- 
cation of this property to certain types of composite hypotheses which may be 
reduced thereby to equivalent simple hypotheses.' The principal results of this 
nature may be summed up as follows: If x has distribution of the form/(x, B)y 
where B is either a location or scale parameter, or a vector denoting both, then 
there exists, in samples (xi , • • • , Xn) a set of functions yi{xi , • • • , Xn), f = 
1, 2, • • • , p, p < n, having joint distribution D(yi , • • , Pp) independent of By 
and such that the converse statement holds, namely, if {p,} have the distribution 
j • * * > 2/p)> ^ ® distribution of the form /(x, B). There 

is a corresponding statement when x has a distribution of the form /(x — SOiW,), 
where the {a,) are parameters, and the [ui] are regression variables. 

^ Wo use the terms simple and composite hypotheses in the sense of Neyman and 
Pearson [2]. * 
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2. Location and Scale. This section is devoted to the study of functions of 
the sample observations which are such that their distributions determine the 
distribution of x, except possibly for location and scale. 

It will be assumed that associated with x there is a function F{x) such that 
(a) F{x) is monotone non-decreasing, 

(5) F{— x) = lim F{x) — 0, and (c) F(x) = lim Fiji) = 1 

x«e~.aD f«i0 

with the normalization F{x) upper semi-continuous. F{x) is the probability 
that the random variate takes a value less than or equal to x. If F[x) is as- 
sociated with the random variate x we say that x has the distribution F{x). 
If g{x) is a Borel-mcasurable function, the Lebesgue-Stieltjes integral 

g(x) dF(x) is denoted by E[g{x)\. 

The characteristic function v>(0 = E(e'**) determines /’’(x), that is, if 
f e“*dG(x) = / e'‘UF(x), then F(x) = G(x). 


Similarly, let F(xi , • • • , x*) be such that 

(o) F(xi , • • • , x,_i , Xi + h, x <+, , ■■■ , Xk) > F{xi , . • • , X, , • • • , X*) for 
h > 0 and t = 1, 2, • • • , A:; 

(6) lim F(xi, . . . , X*) = 0, t = 1, 2, • . . , A; 

«p 

(c) lim F(xi , • ■ • , Xfc) = 1 ; 

* 1 .* 

with the normalization F{xi , • • • , Xt) continuous on the right in each x< . If 
F{xi , . • • , xt) is associated with xi , • • • , x* we say that Xi , • • • , x* have the 

joint distribution F(xi , • • • , x*). As before, E[H{xi , • • • , x*)] = I H dF, 

JUk 

where Rk is the Euclidean A-space. It is well known that under such condi- 
tions, given Borel-measurable functions y<(xi , ■ • ■ , Xk), i = I, ■ • • , p, p < k, 

then G{yi , • • • , j/p) = / dF{xi , • • • , x*), where R{y) is the region [yi(xi , • • • , 

Xk) < Vi , ■ • ■ , yp(xi , ■ ■ ■ , Xk) < yp], is again a distribution function satisfying 

the conditions above. Moreover, / y(yi , ■ • ■ , yp) dG{yi , ■ • ■ , y,) = 

Jr 


I 9\yi{xi , • • • , Xk), • • • . yp(xi , • ■ • , x*)l dF, where R' is the set of all points 
Jk' 

(xt , ■■■ , Xk) such that [yi(xi , ■ ■ ■ , Xk), ■ • ■ , yp(x\ , ■•■ , x*)] « R. 

If X has distribution F(x), then, by definition, the set (xi , • • • , x„) is a sample 
from this distribution if xi , • • • , x„ have the joint distribution F(xi) • • • F(xn). 

The following theorem states that two distributions giving rise, in sampling, 
to the same distribution of the set Xi — x„ , x» — x„ , • • • , x,_i — x„ , with 
n > 3, can differ at most by a translation, that is, the distribution of that set 
determines the original distribution except for location. 

Theobem Ia: Let x have the distribution F(x). Denote by S the set of zeros of 
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J e’** dP'ix) and denote by e the g.l.b. of\t\fort in S. Suppose that the comple- 
ment of S is t-connecled? Suppose thcd x' has distribiUion Oix'), and letxi, •••,*» 
and x'l Xn be samples. Then the set Wa = Xa — Xn , a = 1, • • • , n — 1, 
have the, same joint distribution as the set w'a = x'a — x» if and only if there exists 
a constant a such that x' + a and x have the same distribution. 

Proof: The sufficiency of the condition follows inunediately, since wl = 
Xa - x'n = {x'a + o) - {x', + o). 

In establishing necessity, only the fact that Wi , Wt have the same joint dis- 
tribution as is needed. This hypothesis implies that 

that is, 

Set <p{t) = E{e'‘^), ^(t) = E(e***'). The relation above becomes 

( 1 ) «!-«*) = ii - tt). 

Consider equation (1) for values of U , U in the neighborhood of < = 0. v’(O) = 
1^(0) = 1, hence there is an interval \t\ < 5, in which <p{t) and ^(t) do not 
vanish. It is easily shown that ^(t) and ^(/.) are each continuous, since c’‘^,in 
the neighborhood of < = 0, is continuous uniformly for any bounded interval 
of X, and since A may be chosen so that 1 — F{A) and F{—A) are both as small 
as desired. In the interval | < ] < 8 the f unction f{t) = >p{t)/ip{t) is continuous. 
Also, ((>{ — t) — (fiH) and ^{—t) = i/{t). Setting <* = 0 in (1) we obtain 
<p{t)<f'{—t) = 0) hence j ifi{t) 1 = 1 >!>{() |, that is, |/(0 \ = 1. f{t) takes 

values on the unit circle of the complex plane, and /(O) = 1, hence there is an 
interval | < | < 8' such that z = /(<) lies on an arc y, of length less than 2ir, 
containing the point « = 1. Now consider the functional equation (1) for 
1 < 1 1 < i8', 1 <* 1 < ^8'. (1) becomes 

- <.) = 1 . 

The interval | < | < 8' was so chosen that for 1 1 < i8', 1 /» 1 < ^8', it is possible 

to define a single-valued branch of the argument of /(fi), /(<*), and /(<i -|- <*). 
Letting <* = 0 we have /(<)/(— 0 = 1» hence, replacing /(— U — it) by l//(<i + 
it) in the last equation, we have 

f{ii)f{tt) = fih + ti). 

Arg f{ti), arg /(<*), and arg /(<i -1- tt) are uniquely determined, except for some 
fixed multiple of 2 t. If we choose the principal value of the argument, i.e., so 

' The set S is c-conneoted if any two points p, q, in 8 can be connected by an e-ohain, 

1. e., there exists a set po “ p, Pi i • •• , P«-i . p. " ?t such that | pi — p,_i j < «, t - 1, 

2. ••• . n. 
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that 0 < arg /(() < 2ir, we must have 

arg/(<i) + arg/(<j) = arg/(<i + U) 


for I < 1 1 < ^S', I tj I < is’. Since &rg f(t) is continuous, any solution of this well 
known functional equation must be of the form arg f{t) = at. |/(0 | = 1, 
therefore there exists a constant a such that f{t) = e*“, for 1 ( j < JS', that is, 
^(0 = e“V(0i for 1 1 1 < fS'. By use of (1) this may be extended to hold for 
all i such that | < | < c, where e is the minimum modulus of all t such that ^(0 0. 

(1) may now be used to extend the relation for all t such that ^(0 0 by choos- 

ing an echain connecting the origin to the point t. We know already that 
<f>{t) = e’“V(0 if ^(0 = 0, hence it holds for all t. This relation says that 
Eie'*”) = *’“^), hence x' + a and x have the same distribution, thus 

completing the demonstration of the theorem. 

It should be remarked that the set (xi — Xn , ■ • • , x„_i — Xn) may be replaced 
in Theorem la by any equivalent set, for example, {xi — £,••• , x„_i — S). 

The next result is of the same nature as Theorem la except for the replace- 
ment of the location parameter by a scale (positive or negative) parameter. 


Theorem 1b : Let x have dulribution F{x), such that the zeros 



Id) 


dF{x) 


are nowhere dense, and let x' have distribution G{x'). Let Xi, • • • , Xn and 
x'l , ■ • ■ , x'nbe samples from the distributions of x and x', with n > 3, then the set 
Wa = Xa/xn , a = I , • ■ ■ , n — I , havc the same distribution as the set Wa == 
xl/xn if and only if there exists a constant c such that ex' and x have the same 
distribution. 

Proof: The sufficiency of the condition is evident. Suppose, then, as before, 
that Wi , Wi have the same joint distribution as w'l ,wi . Log ] Wt | and log 1 le* ] 
have the same joint distribution as log | wi | and log 1 w* j, hence by application 
of Theorem la to log j x 1 and log ] x' ] it follows (since the complement of a 
nowhere dense set is econnccted for every «) that there exists a constant a such 


that 




it [log Ix'l-aj 


dG(x). 


Let y — e”“x', then ] x | and | y | have the same distribution, and 
(2) f c’““*'*'dF(x) = f e*‘ dH(»), 


where y lias distribution H(y). We now have to show that either y or —y has 
the distribution of x, that is, it must be shown that either H(y) = F(y), or 
H(y) = 1 - F{-y). 

By the first part of the theorem the functions Ut == yi/yt and ut = yifyi have 
the same joint distribution as Wi , wj . It is clear that the mean value of any 
function of Wi and u^ is the same as the mean value of the corresponding func- 
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tion of wi and wt . Hence 

/// sgn tc 8gn ic» dFixi) dF(x,) dF{xt) 

III lo, i.,i+i, i„ i.,n Bgn 8gn M, dH{yi) dH(yt) dH{yt), 


where sgn x = 1 , for x > 0, sgn x = — 1 for x < 0. 

(sgn tci)(sgn Id) « (sgn Xi)(8gn Xj), 
so that the last equation becomes 


III '»« '»• '*•'» sgn xi sgn X, dF{xi) dFixt) dF(xt) 


(3) 



^iUi (log Ivil- log IvgD+fg (log lyil- log IgsDl 


Sgn yi 


^i(0 - / e*“'*'''dF(x); 

Ut) “ / sgn xdF(x); 


X sgn ytdHiyt) dHiyt) dH{yz). 

w«) = / e'*'^'*'8gixydH{y). 


From (3) we have h - h) = h - t,) for all 

ti , tt, and from (2) we have ^^ 1(0 = v>i(0 for all t, hence, if i^i(— h — h) ^ 0, 

By hypothesis the zeros of are nowhere dense, 
hence if <i — <») = 0 there is a sequence t‘"’, such that — <i — <t 

and 5 ^ 0. Now take an arbitrary sequence <{"' such that 4"’ — ♦ k , 

then <*"’ = — — ti"’ must tend to k . For each n we have = 

All the functions appearing are continuous, thus we see that 
= <fit{k)<Ptita) for all k , k ■ From this it follows directly that either 
^t(0 = ^(0 for all t or = — <d(0 for all t. We have* 


hit) ~ [ e^^'^^UFix) + ^ 

[” c’"“*‘-*’dF(x) 

r* 

/•O 

Ut) “ / e‘‘"**dF(x) - 

[ e*' dF(x) 


' The assumption has been made implicitly that F(x) and 0{x) are continuous at x » 0, 
otherwise the distribution of Xi/x, is not properly defined, and the functions ^<(0 and ^i(t) 
are then not defined. Similar assumptions will be made whenever necessary in later 
theorems. 
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Mi) “ j[* c" 4- e*‘ dH(*) 

and ^«) » j[* c“ dff (*) - £* e*‘ "* dH(®). 

Combining these expressions with the relations obtained above leads, by Fourier 
inversion, to the result that either F{x) » H{x) or Hix) ■ 1 — F(—x). We 
have shown that either y or — y has the same distribution as *, that is, either 
e"“*' or has the same distribution as x. 

Theorem' Ib states essentially that the joint distribution of the set *«/*n , 
a=l,---,n— 1, determines the distribution of x except for a scale parameter 
and possibly a reflection. In the event that x has an asymmetrical distribution, 
and if it is desired to rule out negative changes of scale, a variation of this pro- 
cedure is necessary. The next result is appropriate for this situation. 

Theorem Ic: Let x have distrUmlion F(x) such that the zeros of j c’*'®**** dF{x) 

are nowhere dense, and let x' have distribution G(x'). Let X\, • •• ,Xn and 
x[ , • • ■ , Xnbe samples from the distributions of x and x', with n > 3. Express 
xi , • • • , Xn and x[ , • • - ,x'nin spherical coordinates 

Xi = r cos , xi = r' cos o'l 

Xt — r sin Oi cos flj , x* » r' sin 6i cos 6t 

• • 

• • 

• • 

Xn = r sin di sin • • • sin 9.-1 , xl « r' sin 9i sin 9j • • • sin 9.-1 . 

Then 9i , • • • , 9„_i have the same joint distribution as 0i , $n-i if and otdy 
if there exists a positive constant k such that kx' and x have the same distribution. 

Proof; Sufficiency of the condition is an immediate consequence of the fact 
that 9i , . • • , 9„_i are invariant under the transformation x = kx', with A: > 0. 
If 9i , • • • , 9.-1 have the same joint distribution as 9l , • • • , dn-i then the set 
|xa/x„l have the same joint distribution as the set {x'jxn\, hence, by Theorem 
lb, there exists a constant c such that cx' has the same distribution as x. To 
establish necessity of the condition we must show that | c | x' has the same 
distribution as x. 

Set y 3= I c I x', and let yi , • • • , y. be expressed in spherical coordinates; 
yi , • ■ * , y. have the same angular coordinates 9i , • • • , 9»-i . This implies 
that xi/r and xj/r have the same joint distribution as yi/R and yt/R, where 

R V • • . + yl ; ?/ “ Xi/I Xt I, therefore Xi/| xj | has the same dis- 

tribution as yi/l ys J, so that 

// e* sgn dF(x,) dF{xt) » // e* sgn dff(yi) dH(yO 
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if y has distribution H{y). 
yields 



sgn x \ ) so that the last equation 


c" *"* sgn * dF{x ) . e-‘ **• dF{x) 

* e*‘ sgn ® dff (*) . e’*‘ *“« dH(x). 


We know already that | x | and | y 1 have the same distribution, so that 

(4) jf" e'‘ ““ dF(x) = jf * c*' dHix), 
thus 

(5) e'' sgn * dFix) = j" e'‘ sgn a: dH (x), 

except possibly for zeros of / e~‘*^®**** dF(a:). By hypothesis the exceptional 

J-ao 

points are nowhere dense, so that, by continuity, (5) holds for all L (4) and 
(5) together imply, as in the proof of Theorem Ib, that F(x) s H(x)y i.e., x and 
I c I x' have the same distribution. 

The next three results are generalizations of Theorems la, b, c, to analogous 
multivariate situations. The first of these is a direct generalization of 
Theorem la. 

Theorem IIa: Let , • • , Xk have joint distribution F{xi , • • , Xk) such 

that the complement of the set S of zeros of j dF{xi , • • , Xib) is t-connected, 

where € is the g.l.b. of \t \ for (t) in 5, and let y\ , • ■ • , yk have joint distribution 

G{yi , • • • , Vk)- Let (xf , • . . , x^) and (y? , • • • , 2/* ), a = 1, • • • , n, he samples 

from these distributions y with n > 3. Then Wi^ = x< — x? , z = 1, • • • , fc, 

— 1, have the same joint distribution as the corresponding set Vi^ = 
yi yi if CLnd only if there exist constants ai , • • y Uk such that + ai , • • , 
yk + Ok have the same joint distribution as Xi , • • • , x* . 

Proof: Set 


vih) 

•••,<*) — j 

f ... 

• , »*), 

Hh, 

•••,&) = j 


• ,y*). 


If Wi 0 y i — 1, • • • , fc, jS = 1,2, have the same joint distribution as , then, 
as in the proof of Theorem la, we have 

f ‘ ' f tkl)^(fl2 , • • • , ~ ^15 , • • • , "- tkl — tk2) 

= }f^(tn t ' ' ' 7 ihl)^{tl 2 , • • • , <11 — ^12 , • • • , — 


( 6 ) 
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Again, as before, | ^ | * 1^1; (p{tx t , k) and , • . , are continuous; 
^(0, 0, • • , 0) * ^(0, 0, • • • , 0) = 1. There will exist a neighborhood JV of 
(0, 0, • • • , 0) such that for (<i , • • • , tk) €N the function f(ti , • • • , tk) ** 

i is defined and continuous. Then there will exist a neighborhood 

I •••><*) 

N' CZ N such that in N' there exists a uniquely determined branch of 
arg f(ti , • • • , tk), continuous in N\ and such that if (<i , • , tk) tN' and 

(ui , • • • , u*) € jV' then arg/(ii + Ui ,••,<& + u*) is also uniquely determined 
and continuous. For {t) c JV' and (u) € N\ arg / satisfies the relation 

arg/(^l ,•••,«*,) + arg/(ui , . . . , w*) = arg/(ii + Wi , • • • , + Uk). 

It is easily shown that any continuous function satisfying the equation above 
must be of the form lark , therefore 

(7) <pik , • • • , it) e N'. 

Just as in the proof of la the relation (7) may be extended, by use of (6), to 
hold for all t. This implies, finally, that the set {yi + Ui] have the same 
joint distribution as the set {Xil. 

Theorem lib is a generalization of Theorem Ib to multivariate distributions. 
Theorem IIb: Let xi , • • • , x* have distribution F{xi , • • • , x*) such that the 

zeros of J **’’* ciF{xi , • • • , x*) are nowhere dense^ and letyi, • • ^yk have 


distribution G{yi , • , yk)- Let (xf , • • • , x®) and (j/x* , • • • , y* ), a « 1, • • • , n, 
be sampleSj with n > 3. Then the set Wifi « x</x? , t == 1, • • • , fc, jS = 1, • • • , 
n — 1, have the same joint distribution as the corresponding set Vifl = y^/yi if 
and only if there exist constants Ci , • • • , c* such that the set dyi have the same 
distribution as the Xi . 

Proof: The demonstration is parallel to that of Theorem Ib. By Theorem 
Ila there exist ai , • • • , a* such that 


Eie 


iXtr loc Url 


lyrl+«r 


). 


vSet Zr = , then 

(8) / dFixi , . . . , xt) - / dH{z, 


where (zi , • • • , z*) have distribution function H{zi , • • • , 

We shall continue the proof from here under the assumption that == 2. 
It will be evident how the proof goes for any k. We have, since z?/zj have the 
same joint distribution as xj/x* , 

00 

/// Bgn ^^^sgn^dFCxl, xi) dF{x\, xl) dF(x\,x\) 

(9) 

00 

“/// sga^^^n(^dH{xl,xl)dHix\,xl)dHix\,x\). 
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Both members of ( 9 ) are evaluated as products, just as was done in previous 
proofs, and from the result, combined with (8), we conclude, as in Theorem Ib, 
that 

00 00 
j J g*2«,iogi»rl dF{xi, xt) — j sgn xidH(xi, xt), 

—00 —00 

where Si = ± 1 , for all {h , It). Similarly 

00 00 

/ / '<■«'*'' sgn xtdFixi, Xt) = St jj sgn XtdHi^Xt, xt) 

—00 

and 

J J e**'’’ sgn xi sgn xt dF{xi ,xt) = St j j e'^‘' sgn xi sgn xj dHixi , Xt), 

—00 — OO 

with S 2 = ±1, «3 = dll. 

00 

Set <piili ,tt) = J j e'^*' sgn Xi dFixi , Xt) 

—00 

00 

•Piih, ^ ^ II e*'^*'*"*'*'' sgn Xt dF{xi,Xt) 

—00 

00 

h) = j j sgn Xi sgn xt dF{xy, xt) 


and let ^i(<i , ^2), ^2(^1 , ^2), and ^12(^1 , ii) denote the eorrespondiiiR transforms 
of H{xi , X2). We have 

f h) = 8 i \ l / i{tiy iz) 


( 10 ) 


<P2(ii t h) == *2 ^ 2 (^ 1 , h) 
fe) = ^8^12(^11 fe) 


with Si = dbl, «2 = ±1, and S3 = ±1. 


Now, as in ( 9 ), by considering 
obtain the relation 



} fel)^2(fl2 , ^22)^12(’~ ^11 ^12 , *“■ 

= J t2l)^2{tl2 ) ^22)^12(~ hi — ^12 , — “• fel), 

showing that Si , S2 , S3 , may be chosen so that SiS2S3”= 1, that is, SiSa = Ss . 



COMPOSITE HYPOTHESES 


Consider now the variates = «,«,, r = 1, 2. Let K(zt , Zt) be the distribu- 
tion function of z'l , Z 2 . If we let 6i(ti , k), Ot(ti , t»), and 9M(fi , tt) be the trans- 
forms of K which correspond to ^i(ti , <j), iptih , t»), and vu(ti , <j) respectively, 
it is evident that 

1 == ^l(^l , t%) 

( 11 ) , ti) = 62(11 , /«) 

^12(^1 , fe) = 6 i 2 (ti j ^2). 

Moreover, from (8), 

00 00 

J J = If dK(zi,Xi). 


The last relation, together with the equations (11) imply that F(x) and K(x) 
eoincide in each quadrant, thus F(xi , X2) ^ K(xi , X2) for all Xi , X2 , 

The final result, is that z'l , zi have the same distribution as Xi , X2 , i.e., Sie^'i/i 
and S2 c'** 2/2 have the same joint distribution as Xi and X2 . 

The next result bears the same relation to Theorem Ilb that Theorem Ic 
bears to Theorem Ib, that is, only positive scale changes are to be permitted. 


Thkoiiem lie: Let Xi, • • • , Xk have distribution F(xi , • • • , xa?) stick that the 

zeros of J , . . . , a:*) are nowhere dense^ and let yi, • • • , have 

distribution G(i/i , • • • , yk)- Let (a:? , • • • , xS) and (yt y - yS), oc = 1, 2, 

• • • , n, fce samples with n > 3. Express xf , • • • , x* and 2 /“ , • • yyi in spheri- 

cal coordinates 


Xi = r, cos S ] , 
a*® = r* sill cos 6 ] , 


y] = Ri cos (p \ , 
yl = Ri sin cos ip \ , 


Xi = ri sin • • • sin 


yi = Ri sin • • • sin tpi ^ 


Then i = 1, • • • , /c, = 1, • • • , n — 1, have the same joint distribution 
as l^i} if and only if there exist constants &, >(),?’= 1, • • • , fc, s^ich that the 
set kiUi have the same joint distribution as the set Xi . 

Proof: If {^} have the same distribution as jv??} then it follows that < — 

U< 


have the same distribution as 



hence by Theorem lib there exist constants 


Ci such that have the same distribution as {x,). Set Zi ~ \ci\ yi ; we 

wish to show that {^t} have the same distribution as {xt}. By equation (8) 
in Theorem lib it is known that {\zi\} have the same distribution as {j x< 1}, 
moreover, if we express Zi in spherical coordinates, the angular coordinates are 
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41 1 


-1 

Zi 


«!V' 


the same as those of yf , therefore j same distribution as 

since these functions are obtainable in terms of the angular coordinates. 

As before, we shall continue the proof from here under the assumption that 
k — 2. The procedure is a generalization of the procedure in the proof of 

f 

Theorem Ic. sgn x\ = sgn s r-r, >, and similarly for y, therefore 

U IJ 

, x\dF(,x\,xl) dF{x\, x?) 

= J ->»«''?'> sgn xl dHix\,xl) dHix\, xl), i = 1, 2, 
where it is as.sumed that. Zi , z* have distribution H{zi , z*). As Iwfoi-c, set 


( 12 ) 


//« 


•fiih, k) — j 


*2<r log krl 


dF(xi, xt), 


tj) = j* ***'*'' sgn XidFixi, Xt), i= 1,2, 

= j sgn Xl sgn XtdF{xi, xj), 

and denote the corresponding transforms of //(xi , x*) by Oih , <*), fli(<i , it), 
®»(<i , <*), and Oitik , it)- It has been remarked already that 1| z, |) have the 
same distribution as { | x< | ) , therefore d{ti , it) = ^(<i , <*). Equation (12) yields 
the relation ipi{ti , tt)v{—ti , —it) = ffiih , <j)^( — h , —h),i — 1, 2; the zeros of 
v{U , it) are nowhere dense, so that it can be concluded that v>i(ti , it) = ffi(ii , it), 
i — 1,2. Now, from an equation simitar to (12) we obtain ^i*(h , <j) = 0i8(<i , <j). 
As in Theorem lib, the four relations above together imply that F{x \ , xj) ie 
H{xi , Xj). in other words, {1 c< | j/d have the same distribution as ta^d- 
We are now in a position to combine some of the preceding theorems so as to 
obtain analogous results for scale and location parametens together. 

Theorem IIIa: Let x have distribtUion F{x) such that Ihe zeros of J dF{x) 

satisfy the condition of Theorem la, and the zeros of 

/// '’'''^'~^*'^*""^'^"~"''dF{xf)dF{xi)dF{xt) 


are nowhere dense, and let y have distribution G{y), Let Xi , • • • , Xn and 
Vi , • ■ ■ iVnbe samples, with n > 9. Then w„ — ~ " , a = 1, ■ • • , n — 2, 

*n-l — Xn 

have the same joint distribution as the corresponding set w'„ = — if and 

yn-i - y* 

only if there exist constants a, c, such that c(y — a) and x have the same distribution. 
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Proof; Sufficiency of the condition is an immediate consequence of the fact 
that w'a is invariant under transformations of the form y' = c{y — o). Assume 
then that {Wa) and {wl} have the same joint distribution. By elementary 

transformations it is evident that the functions - — — , — — — , ~ ^ — — , 

Xt — xt xr — Xt x% — x% Xt — Xt 

have the same joint distribution as the corresponding functions of the y’s, if 
n > 9. Since Xi , ■ • • , form a sample it follows that the pairs {xi — xt, 
X* — Xj), {x4 — Xe , X6 — Xel, {x? — x» , Xs — x»l, have the same joint distribu- 
tions and are pairwise independent, and similarly for the corresponding func- 
tions of the y’s. Theorem lib assures the existence of constants cj , Ci , such 
that Ciivi — yt), c*(y* — y») have the same joint distribution as (xi — x»), 
(x* — x>). Considering separately the marginal distributions it is seen that 
Ci(yi — y») has the same distribution as Ct(y» — y»). yi — yt and yt — yt have 
the same distribution, therefore either ci = ci , or c* = — ci . Set u« »= x« — xs , 
Va = Ci(ya — yt), a = 1, 2. We have, for the distributions of (ui , lit) and 
{vi , vt), relations corresponding to (10) in Theorem Ilb, with the additional 
condition that = st , because of the S 3 rmmetry in the variables. This implies 
that either (vt , a*) or (— ai , — a») have the same joint distribution as (ui , «*), 
that is, there exists c such that c(yi — yt) and c(y* — yt) have the same joint 
distribution as Xi — Xt and Xt — Xt. Application of Theorem la now completes 
the proof. 

Just as before, there is an analogous situation when we consider angular 
coordinates instead of quotients. The proof is immediate; the angular coordi- 
nates determine the angular coordinates of |xi — xj , x* — Xa) , {x4 — x» , Xj — xej , 
and \xi — X* , xa — x»}, arranged as a sample. Then the constants ci , ci in 
the proof of Theorem Ilia are both positive; it follows that Ci = c* . Applica- 
tion of Theorem la gives 

Theorem IIIb: Let Xi , • • • , x„ and yi , • ■ • ,yn satufy the hypotheses of 
Theorem Ilia. Set 

xi - x„ = r cos 01 , yi — y» = r' cos tfj , 

xs — x„ = r sin 0i cos 0t, yt — Vn — r' sin 0i cos 0t , 


x„_i - x„ = r sin fli • • • sin ; y„_i - y„ = r' sin • sin 

Then di , • ■ • 0n-t have the same joint distribution as 0i ,•■• , Bn-t if and only if 
there exist constants a and c > 0 such that c(y — a) has the same distribution as x. 

Theorem IVa is a generalization of Theorem la to cover arbitrary linear com- 
binations of some subset of the sample. 

Theorem IVa: Suppose x has distribution F(x) such that j «*** dF(x) does not 

vanish, arid let y have distribution 0{y). Consider the functions w. 

n— m n— m 

Xa 52 ~ J/a ““ j ® 2, • • • , Wl, fi ^ 1, 2, • • • | 
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n -- niy and suppose that m > n — rn, Then^ if have the same joint distri^^ 

n"»m 

button as {r^I} and if ^ la$ 9 ^ I for some a, it follows that F(y) * G{y); if 

— 

2^ ZajS = 1 for all a there exists a constant a such that F{y — a) se G{y). 
fi 

Proof: Denote the characteristic functions of x and y by (p(t) and ^(0 respec- 
tively. By expressing the fact that {wa} and {wa]f a — 1,2, ••• ,n — m+1, 
have the same characteristic function we obtain the functional equation 


n~m+l n—m / n— m+1 \ n —m+ l n— m / n—m+l \ 

n v»(on^(-£ n iA(oniA(-£ w<.) 

a-l ^-1 \ a-l / 0-1 15-1 \ o-l / 


By hypothesis <p{i) does not vanish, therefore ^(0 has no zeros, because of the 
relation above. <p{t) and ^(/) are continuous, thus the function /(<) = 
log <p{t) — log ^(t) can be uniquely defined in a continuous manner for all /. 
The equation above becomes 

n—m+l n—m / n—m+l \ 

(13) Z /(0+ Z/(-Z la|><a) = 0. 

0—1 /J— 1 \ 0—1 / 


The constants are necessarily linearly dependent, so that, for some a, lap 
can be expressed as a linear combination of the others; suppose then that 


In- 


m+1,0 


= Z ej. 




Putting these values in (13) we have 

n —m+ l n—m / n—m \ 

(14) f{ta) + Z/{ ~Z laffita + <n-m+lCa) ) = 0. 

o-l (3-1 \ o-I / 

It can be assumed that I,e\ 5^ 0, for, if c* = 0 for all a, we have = 0, 

/3 = 1, • • ■ , n - OT, that is, w'„_m+i = y«-»+i and Wn-m+i = a-,, mu , hence x 
and y have the same distribution. Assuming ci 9 ^ 0, set i„ = mn , 

« = 2, • • • , n — m, in (14), obtaining 

n—m n—m 

(15) fiti) + 2 /(”"<^a^n-m+l) + /(<n-m+l) + S /( “” + ^W-nm+l)) = 0, 

0-2 /5-1 

n—m 

now, recalling that /(()) = 0, set = 0, getting /(/i) + Z /(" ^1/3/1)- 

3-1 

Evaluating this with argument <1 + ci<„ m+i , and substitulhig back in (15) it 
appears that 

n—m 

(16) f(ti) +/(<^,m+i)+ 22 /( — ®“^»-'"+l) = /(b + fil<n-m+l). 

o-J 

Now setting <1 = 0 in (16) we have the relation 

n—m 

^(^n— m+l) ~i“ /( ^o^n— m+l) “ /n—m+l) ^ 
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thus we have finally f{tt) + /(ci<„_«+i) = /(<i + eit«-m+i), or, since Ci 5^ 0, 
fih + ti) — f(U) + /(<*). The last relation implies that /(0 et, since /(0 is con- 

fn— m+l n-Wn n— m "N 

tinuous. Now replace /(O bydin ( 13 ), gettingc< 52 ~ 52 2 “ 


n—m 

0, that is, either c = 0, or 52 = 1 for all «• We conclude then that ¥>(<) = 

/ 5*"1 

^( 0 , unless 2 ia/s = 1 for all a. If ]C = 1 for all a we have ^(0 = e®V( 0 « 
5 $ 

fp{—t) = ^(0 and ^( — 0 = ^(0) hence c is of the form c = ia, where a is real, 
in other words <p{t) = thus concluding the proof of the theorem. 

It was assumed in Theorem IVa that <p{t) has no zeros. If (p{t) has zeros 
wc have proved that, for an interval ] i ] < €, (p{t) = (or ^(0 = e*“V(0)- 
This does not necessarily imply the result of Theorem IVa, but it does imply 
at least that if the A;th moments of x and of y (or of y — a) both exist they 
are equal. 

The last result in this series can be proved by methods similar to those used 
in Theorem IVa. 


Theorem IVb: Let x and y satisfy the hypotheses of Theorem IVa. Suppose, 
moreover^ that m > 2(n — m), that the rank of || /«/» || is n — m, and that 

n— m 

5 ^ 1 «(S 7^ 1 for al least 2m — n values of a. Then, if there exist constants {c.} 

such that the set {Caw'a} have the same joint distribution as {w.}, it follows that, 
for some a, c«{/ has the. same distribution as x. 


3. Application to Composite Hypotheses. The results of section 2 have a 
significant application in the theory of testing composite hypotheses. Suppose 
that X has a distribution of the form F{x, $i , $t), and that the hypothesis 
dt = 02 is to be tested, without reference to the value of 0 i . We assume that 
the parameters are independent, i.e., F(x, 0i , 62) 'fe F{x, 0i , 62) implies that 
6t = d'l and = $2. It is true in a wide class of important cases that, given 
a sample Xi , ■ • ■ ,Xn from the distribution F{x, $1 , $2), there exist functions 
Vttixi , • ■ • , a;„), a = 1 , 2 , ■ • • , p, such that jj/o) have joint distribution inde- 
pendent of 0t , but depending on 62 Now if the \ya\ are such that their joint 
distribution redetermines the original distribution, except for 61 , one can reason- 
ably use the p-dimensional distribution of the {pe) for testing the hypothesis 
82 = 62 , thus reducing the composite hypothesis to a simple hypothesis. In 
testing this simple hypothesis, every alternative hypothesis (corresponding to a 
value of 82) determines a distribution of x among the alternatives F{x, 81 , 82) 
except for the unknown 81 , that is, there is a one-to-one correspondence between 
the two sets of alternative hypotheses, expressed by the fact that if 0 * = 
then the distributions of the set {j/«} corresponding to 82 = 82 and 82 = 0t 
must be different. 

Suppose, for example, that it is desired to test whether y = x — aior some a 
has the distribution F(y, 8^), with the assumption that, for some o, y has the 
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distribution F{y, d). Given a sample one can form the set = *0 — a:* , 
a=l, 2, — 1, obtaining the distribution G(wi , • • • , w„_i , d)\ now con- 

sider the simple hypothesis $ - f, knowing that G determines 6, by Theorem la. 
Similarly one can test whether cx, for some c 5 ^ 0 , has distribution F{y, tf®), 
by forming = x«/x» , « = 1 , • • • , n — 1 , or by expressing (xi , • • ■ , Xn) 
in spherical coordinates and considering the angular coordinates, according to 
whether both positive and negative or only positive values of c are to be allowed. 

In the same way one can test the hypothesis B = ^ under the assumption 

that c(x — a) has distribution F{y, 6) by forming Wa = — ,a = 1 , • • • , 

X„_l — Xn 

n — 2, or by expressing (xi — x* , • ■ • , x„_i — x„) in spherical coordinates and 
considering the angular coordinates. 

Theorem IVa may be applied to analogous problems, in whicrh the hypot hesis 

** tf® is to be tested under the assumption that y = u — 2a, Xi has distribution 
F{y, B) for fixed values of the Xj , with the a, unknown. In such problems 
there exist linear combinations of the observed values of y which are independent 
of the . By Theorem IVa, under certain conditions the joint distribution of 
these linear combinations determines the original distribution of y, without 
regard to the a,- . 

In applying some of the preceding results we must verify in certain cases t hat 
the zeros of J dF{x) are nowhere dcn.se, for a certain distribution function. 


By a change of variable the condition of Theorem Ib can be stated in this form; 
moreover if F(x) satisfies this condition it is evident that it satisfies the condi- 
tion of Theorem la. A sufficient condition applicable t.o a considerable class 


of cases has been obtained by Levinson 14] ; if f(x) is 0(e * ) as x — ► « , where 

B(x) is monotone and dx diverges to » , then J F‘y(x) dx cannot vanish 

on an interval without vanishing identically. It is evident that it is likewise 
sufficient if the corresponding condition holds as x — » — « inst ead of + ■» . In 
particular, if there exists A such that/(x) = 0 for x > /I (or for x < A) it is a 

consequence of the Levinson result that J e'‘y(x) dx has no intervals of zeros. 


It can be established easily that if fix) is majorized by | x | « > 0, in the 

neighborhood of the origin, then f c’* '“^'/(x) dx has no intervals of zeros. 


As a simple example consider the rectangular distribution on (0, 1). liCt 
(x — o)/r have this distribution with a unknown, r > 0, and suppose that we 
are interested only in r. Given a sample Xi , • • • , x„ form the functions ya — 
(*« - Xn)/r, a « 1, • • • , n - 1. Set yj, = max (y„ , 0), y^ = min (y* , 0). 
Then it can be shown that yi, ■ ■ ■ , yn-i have probability density (1 — j/w + Vt) 
in the region —1 < |/« < I, ya — yu < 1, zero elsewhere. ^ = y* — j/t is 
of course the quotient of the sample range by r. It can be shown that ^ has 
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density n(n — 1)(1 — Theorem la makes it possible to base any 

tests not involving a. on the distribution of the y . , since if the have the 
stated distribution then (x — a)/r for some a must have the rectangular dis- 
tribution. 

Similarly, suppose y — {x — a) /r has the distribution e~^, y > 0, for some 

a, r. Then Wa = “ " , a = 1,2, .- n — 1, have distribution density 

r 

1 (0, Wa)- Again, the latter distribution may be 

n 

used to estimate r. 

L(»t us examiiu* the distributions of binctions of the type considen^l, in the 
eas(‘ of normality. Assume that , • • • , Xn are a sample of n observations 
from a normal distribution with unit variance and unknown mean. The 
variables ?/„ = — :ri , a = 2, • • , n, have a joint normal distribution with 

zero means and matrix of variances and covariances |1 A*^|| = || 1 + 5i;l|. 
Then Theorem la shows that if {ya\ have this joint distribution then x is nor- 
mally distributed wit h unit variance. Note that xl-i * ZAny^j * 2(x« — x)*. 
If we had x = x^ /(f, then 2(xl , giving the estimate 

— - ^{Xa - ‘f’')* 
n — 1 


There are, of course, many ways in which the matrix H Ai*, || may be trans- 
formed into a diagonal matrix in order to obtain a new set of independently 
distributed variates; one convenient set is the set \/^ yt , \/| (j/j — ^yi), . * . , 


4 1 ]C 2/a )• terms of the original x^s we have \/i (x 2 — oji) 

V n \ ri — 1 cr-* ^ 

VI (^’s — + ^ 2 ))’ ^ ^ these functions of the 

Y n \ n — 1 ft-i / 


data are independently distributed according to the normal distribution with 
zero mean and unit variance. 

Similarly, in the case of a sample Xi , • • • , Xn from a normal distribution with 
zero mean and unknown variance, there exists a set of n — 1 functions with 
distributions indejiendent of the variance. A convenient set of functions is 
the set 


, _ Vmawi, 

— 


\/±x{ 


m = 1, . . • , n 


It is known (see Bartlett [1]) that the variables tm are independently distributed 
according to student ^4i6tributions with tn degrees of freedom respectively. 
The set determines the set of angular coordinates obtained by expressing 
* 1 , • • • , x„ in spherical coordinates, hence we can conclude, conversely, that if 
{tmj have this joint distribution then x is normal with mean zero. 
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Finally we can eliminate both mean and variance. Suppose Xi, • • • , are a 
sample from some normal distribution. The variables 


Wm = 



m = 1, 2, • • • , n — 1, 


are normal and independent with mean zero and some variance. Then we have 
the set 



independently distributed according to /-distributions with r degrees of freedom 
respectively. It may be convenient for cornputetional purposes to make use of 
the identity 

r • f , J '12 r+1 / , r+1 \2 r+1 

E . v-j i ^'1 - i 1 £ ~ 

J-1 J + 1 I J «-l J \ » T 1 i~l / J-l 

We then have 


tr = 


\/ri^ 


Y *4” 

r + 2 


2) ^(>•+1)) 

/r+l“ ’ 

y E 


r = 1, . . • , n — 2. 


Now, by Theorem IIIc, we know that if the set { t', j has this specified distribution 
then X must be distributed according to some normal distribution. The set 
{<^} may be used to test the goodness of fit of the observations to normality, 
by first adjusting the set |/rl to a standard basis of comparison, i.e., by con- 
sidering F,{i'r), where Ft is the corresponding cumulative distributioDi function 
and then applying, for example, a x* goodness of fit test to these n — 2 quanti- 
tities, with respect to the rectangular distribution on (0, 1). 
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THE SELECTION OF VARIATES FOR USE IN PREDICTION WITH 
SOME COMMENTS ON THE GENERAL PROBLEM OF 
NUISANCE PARAMETERS 

By Harold Hotelling 

1. Maximum Correlation as a Test For predicting or estimating a particular 
variate y there is frequently available an embarrassingly large number of other 
variates having some correlation with y. For example, in fitting demand 
functions by means of economic time series, the number of series of observations 
having some relation to the demand which is sought to be estimated is apt to be 
very large, whereas the number of good independent observations on each is 
quite small. The proper coefficients in the regression equation must ordinarily 
be determined from the observations, and must not exceed in number the ob- 
servations on each variate. Furthermore, in order to have a measure of error 
t.hat will make it possible to distinguish real effects from those due to chance, 
it is necessary that the number of predictors^ shall be enough less than the 
number of observations on each variate so that the residual chance variance 
can be del-(U’mined with an appropriate degree of accuracy. It is desirable to 
scdect a set of predi(;tors yielding estimates of maximum but determinable ac- 
(uiracy, and at ih(^ same time to avoid the fallacies of selection among numerous 
results of that one whi(jh appears most significant and treating it as if it were 
the only one (examined. 

( 'OnsidiM-ations other than maximum and determinate accuracy are of prac- 
tical importance. The labor of calculation by the method of least squares 
IxM^omes a serious obstacle to the use of the theoretically optimum set of vari- 
ates when these are very numerous, though the rapid current development of 
mechanical and electrical devices suitable for these computations offers a hope 
that the limits now set in practice in this way will soon be considerably increased. 
Furthennore, predictions or estimates must, as in speculative business or in 
military activity, be made from moment to moment, often in a rough manner 
by persons incapable of or averse to using complex formulae, and in such activi- 
ties frequent revisions of the regression equations must be made to accord with 
altered conditions. Also, in temporal predictions, the time of availability of 

^ I use this term for what are often called the independent variates in a regression 
equation, since these ordinarily are not really independent in the probability sense. Simi- 
larly I shall call the ‘‘dependent” variate the predictand. By prediction I mean merely the 
use of regression equations to estimate some unknown variate by means of the values of 
related variates, without any necessary connotation of temporal order, though the most 
interesting applications seem for the most part to be those in which we pass from a knowl- 
edge of the past to an estimate of the future. 
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the values of thf‘ predictors is important, since an early prediction (e.g. of the 
size of a harvest) is more valuable than a later one of the same accuracy. 

If we make the usual assumption* that the probability distribution of y is, 
for every set of values of the predictors, normal with a fixed variance <r* and an 
expectation that is a linear function of the predictors, we shall wish to minimize 
<r* subject to appropriate limitations, and this amounts to the same thing as 
maximizing the multiple correlation p oi y with the predictors, since 1 — p* is 
the ratio of to the total variance of y, which is the same for all sets of predictors. 
The estimates s and R oi c and p obtained from the available sample are of 
course a different matter. But it is clear that the value of R provides a suitable 
criterion of choice under the following conditions: We are called upon to choose 
one among two or more sets, each consisting of a fixed number of predictors; 
for each predictor we have a known value corresponding to each of the values 
l/i , • • • jVn observed for the predictand; and there is no basis for preferring one 
of these sets to another either in theory, in observations extraneous to those just 
specified, or in cost or time of availability. In particular, if just one predictor is 
to be used, that having the highest sample correlation with the predictand should 
under these conditions be the one adopted. But in making such a choice a test 
of its accuracy is required, to take account of the possibility that the wrong 
choice has been made because of chance fluctuations in the sample correlation 
coefficients. 

There are innumerable economic variates available for prediction of 
business conditions, and most of these are highly correlated with each other. 
The selection of one busine^^s index instead of another for a particu- 
lar purpose will involve the question which has exhibited the higher correlation 
with the quantity to be predicted, and consequently the question of the definite- 
ness with which the difference between the calcTilat^d correlations can be 
regarded as significant. 

Our problem evidently has a bearing on governmental policy in selecting 
among the numerous series of data those whose continuation will be •most valu- 
able. The high cost of assembling these statistics dictates a careful selection of 
a limited number of series having little correlation with each others^ current 
values, but with correlations as great as possible with those things whose predic- 
tion or estimation is most important. 

2. The Choice of one Predictor with Two Available. Let us take first the 
simplest case, which may be illustrated by a Michigan State College problem of 

* We shall not here go into the question of the applicability of these standard assump- 
tions to time series otherwise than to note that some transformations of observations 
ordered in time are usually necessary and aufiicient to obtain quantities satisfying the 
assumptions so closely that deviations from them cannot be detected. Such transforma- 
tions include replacing a variate by its logarithm, and eliminating trend and seasonal 
variations by least squares. ^ In view of the satisfactory adjusted observations found 
empirically by these and similar methods, the usual objections to studying time series by 
exact methods seem much exaggerated. 
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which Dr. W. D. Baten has told me. The ultimate weight of a mature ox is 
estimated by means of his length at an early age. The question has been 
however, whether a more accurate prediction might not be made by viheans of 
the calf’s girth at his heart. Records were at hand of 13 oxen showing their 
lengths and girths as calves and also their weights when matiu^. A regression 
equation involving both length and girth would presumably give greater accuracy 
than either variate alone; but it appears that those who make the estimates 
desire a simple formula involving only one variate. Suppose, then, that in such 
a sample the correlation of weight with length is h « .7, that the correlation 
of weight with girth is ri — .5, and that the correlation of girth with length is 
To = .4. Is the difference ri — r* = .2 sufficiently great in relation to its sampling 
(errors to warrant the inference that girth is really a better predictor than 
length, or must the question be left in abeyance until more observations can be 
accumulated? 

A straightforward procedure which would have been used with little question 
before the advent of modem exact methods is to calculate the asymptotic ap- 
proximation to the standard error of n — by the differential method, assuming 
the three variates to have the trivariate normal distribution, and to regard the 
difference of the correlations as significant if it exceeds a multiple of this standard 
error determined by the tables of the normal distribution. The calculation of 
the asymptotic approximation may be carried out in the following manner. 
Let Pi , P 2 , and po be the population values of ri , rg , and n respectively. Then 
if (Tij denote the population covariance of Xi and x,(f, j == 0, 1, 2), we have 

Pi = 

with similar formulae for ps and po . Likewise the sample estimates of these 
parameters are given by such expressions as 

8oi 

n = 


Taking the logaritlim of this last expression, expanding about the population 
values, denoting by the operator 6 the deviation of sample from population values 
of the covariances, and the resultant deviation in n , and dropping terms of 
order higher than the first, we have: 


In the same way 


6ri = Pi I 

V<roi 


/ Ssoi _ faoo _ \ 

2a’oo 2aii/ 


• ( 4«of 5«oo 

*'• "V™ " Si 


_ 

2<rM/- 


The asymptotic value of the sampling covariance is obtained by multiplying 
these two expressions together and taking the expectation. The sampling co- 
variance of two estimates of covariance of the usual kind (sum of products 



274 


HAROLD HOTELLING 


divided by number of degrees of freedom) in the same sample, having n degrees 
of freedom (which ordinarily means that there are n + \ individuals in the 
sample and that the means are eliminated), is given exactly by the formula® 

E{dSijS8km) = {(Tikffjm + (Tim(rik)llt, 

in which the subscripts may have any values, equal or unequal. When this 
formula is applied to each of the nine terms of the product and the results are 
expressed in terms of the correlations p, , there results the asymptotic expression 
for the covariance given by 

nE{brihr^ = ^piP 2 (pi + P 2 + Po "" 1) + P»)(l ““ Pi ■“ P2)‘ 


This method provides also one of the derivations of th(i familiar formula which 
may be written 

nal, = nE{bnf = (1 - p^)^ naf., = (1 - pl)\ 

The variance of the difference of n and r2 is the sum of their variances minus 
twice their covariance. Hence 


n<7r,-r2 = (1 Pl)^ + (1 — p 2 )^ — PlP 2 (pi + P 2 + Po — 1) + 2po(pi + p 2 "^ !)• 


We are testing the hypothesis that pi = P2 . If we put a common value p 
for them in the last expression and simplify, we obtain for the standard error 
of the difference, 


<Tri-r2 



— Po)(2 - 3p^ + pop^) 

n 


The second factor in parenthes('s is always positive because of the inequalities 
limiting the correlations among three variates. 

This formula contains two unknown parameters, p and po . The classical 
procedure would be substit-utci n , r2 and n respectively for pi , P2 , and po in the 
previous formula, and use th(j resulting standard error expression as if the ratio 
to it of n — 7-2 wen^ normally distributed. A first modification, more in line with 
modern ideas, would be to use some kind of average of n and r2 as an estimate 
of both Pi and p2 , since the null hypothesis tested is that these are equal. But 
whatever sample estimates we substitute for p and po , the formula remains un- 
satisfactor>^, since no suitable limits of error arc available. If instead of the 
standard error we were to work out the exact distribution of n — r2 we should 
still not be free from the difficulty. This exact distribution clearly involves 
both p and po , since its variance does so. Neither can we escape from the 
trouble by using some function z = /(r), such as the inverse hyperbolic tangent 
suggested by R. A. Fisher, and considering the standard error of Zi — Zz — 


® I have given a derivation of this formula from the characteristic function of the multi- 
variate normal distribution [1]. Numerous special cases appear in earlier literature. The 
derivation above is a simplification and improvement of several versions, appearing in 
the various early writings of Karl Pearson. 
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f{ri) — /(r*) ; for this standard error will have as the first term in its expansion 
in a series of powers of vT^ simply the product of the expression above for 
o-rj-r, by /'(p); and this must clearly involve both po and p. 

3. Nuisance Parameters. This is not by any means the only statistical prob- 
lem in which unknown and undesired parameters enter into the distribution of 
the statistic which we should naturally use to test a hypothesis. Indeed, the 
early investigation which was perhaps most influential in setting the whole tone 
of modern statistical research was that [2] in which W. C. Gosset (^‘Student^’) 
arrived at the exact distribution of the ratio of a deviation in the mean to the 
estimated standard error. The previous practice (which unfortunately survives 
today in some quarters, and is even taught to students without explaining its 
approximate character) was to neglect the sampling errors in the estimate of 
the unknown variance and to treat the ratio as normally distributed with 
unit variance. The rigorous derivation by Fisher [3] of the Student distribution 
makes clear the manner in which the nuisance parameter may in this, and in 
some other, problems be eradicated from the distribution through integration, 
after altering the original statistic (the deviation in the mean) by dividing it 
by another statistic. The new statistic, the Student ratio, vanishes whenever 
the old statistic, the deviation in the mean, does so, and the same hypothesis 
is tested by both. This then is one way to get rid of a nuisance parameter: 
when you have a statistic estimating a parameter whose vanishing is in question, 
but whose distribution involves another parameter, alter the statistic by multi- 
plying or dividing by another statistic in such a way that the new function 
vanishes whenever the old one docs so; and do this in snch a way that the new 
distributioji will be indcpenderit of the nuisance parameter. Unhappily, this 
method has been applied successfully only in particular cases, and no way to 
use it ill the problem at hand has been found. 

A second method is that of transformation employed by Fisher in dealing with 
such problems as testing the significance of the difference between the correla- 
tion coefficients in independent samples between the same two variates. The 
need for the transformation in this case is occasioned by the presence in the 
distribution of the difference of the sample correlations of the unknown true 
value, which is not directly relevant to the comparison. We have seen that 
this method also fails to solve our problem. 

A third method of dealing with nuisance parameters is the use of fiducial 
probability by H. A. Fisher [4] and by Daisy M. Starkey [5] in testing the 
significance of the difference between the means of two samples when the 
variances may be unequal. Criticisms of these applications of fiducial probability 
have been made by M. S. Bartlett [6] and B. L. Welch [7], and the field of 
applicability of such methods is still in need of elucidation. 

Some findings of J. Ne 3 rman [8] having a bearing on the general nuisance 
parameter problem should also be noted. 

The only other class of methods for dealing with nuisance parameters of which 
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I am aware involves the comparison of the particular sample obtained, not with 
the whole populatimi of samples with which a comparison might be made if we 
knew the value of the troublesome parameter, but with a sub-population selected 
with reference to the sample in such a way that the distribution, in this sub- 
population, of the statistic used does not involve any unknown parameter. An 
example is the testing of significance of a regression coefficient. Thus if we 
suppose that a sample of values of x and y is drawn from a bivariate normal 
population, and calculate the regression coefficient b of y on x in the sample, 
the distribution of b involves not only the population value 0, but also the ratio 
a of the variances in the population. Since this second parameter is unknown, 
and CAn only be estimated from the sample, it is not possible to use the distribu- 
tion of b in the whole population directly to test the significance of 6 — /3. 
What we do is to find the place of this difference, not in the whole population 
of values in which both x and y are drawn at random, but in a sub-population 
for which the values of x are the same as in our sample. We may alternatively 
say that we limit the sub-population only to that for which the sum of the 
squares of the deviations of the values of a; from their mean is the same as in 
our sample; the results are the same. The distribution in this sub-population 
of the ratio of 6 — /3 to its estimated standard error is of the Student form, with 
no unknown parameters, and on this basis it is possible to make exact and 
satisfactory tests and to set up fiducial limits for b. Another example is that 
of contingency tables. The practie.e now accepted (after a controversy) for 
testing independence of two modes of classification, such as classification 
of persons according as they have or have not been vaccinated, and again ac- 
cording as they live through an epidemic or die, is to compare the obseiwed 
contingency table, not with all possible contingency tables of the same numbers 
of rows and columns, but only with the possible contingency tables having 
exactly the same marginal totals as the observed table. 

4. An Exact Solution. We shall solve the problem of the significance of the 
difference of n and r» with the understanding that the meaning of significance 
is to be interpreted by reference to the sub-population of possible samples for 
which the predictors Xi and xj have the same set of values as those observed in 
the particular sample available. This procedure, besides yielding an exact 
distribution without unknown parameters, has the advantage of relaxing t.he 
stringency of the requirement of a trivariate normal distribution. We now make 
only the assumptions customary in the method of least squares, that the pre- 
dictand y has the univariate normal distribution for each set of values of Xi and 
Xj , independently for the different sets, with a common variance v*, and with 
the expectation of y for a fixed pair of values of the predictors a linear fimction 
of these predictors. No assumption is involved regarding the distribution of 
the predictors, since we regard them as fixed in all the samples with which we 
compare our particular sample. The advantages of exactn^ and of freedom 



SBLiECTlON OF VABIATES 


277 


from the somewhat special trivariate normal assumption are attained at the 
expense of sacrificing the precise applicability of the results to other sets of 
values of the predictors. 

Since the correlational properties are unchanged by additive and multiplica* 
tive constants, we may suppose that 

(1) Sx, = 0 = Sxi , Sxl = I = <Sx| , 

where S stands for summation over a sample of N individuals. The notation 
may be made more explicit by the adjunction of an additional subscript a, vary- 
ing from 1 to N, to denote the individual member of the sample, so that instead 
of Sxi , for example, we might write Sxia . The omission of this additional 
subscript is convenient and will usually leave no ambiguity when we deal with 
sums, but it will be convenient to retain it in connection with individual values. 
The correlation n of xi with x* in all those samples we shall consider is, by (1) 

ro = /SxiXj . 

Now (ionsider the new quantities 

^la 

" >/20 " “ V2(l+r,) • 

Evidently, from (1) and (2), 

(3) Sx' = 0 = Sx", Sx'^ = 1 = Sx"\ Sx'x" = 0. 

Since the mean value E{ya) is a linear function of xi, and x»a , Va may, upon 
subtracting a constant from all these expectations, be written 

(4) Va = jSiXia + /3*Xj« -f A„ , 

where Ai , • • • , Ay are normally and independently distributed with variances 
all equal to v* and expectations zero. The assumption that Xi and x* are equally 
correlated with y in the population leads to the conclusion that A = ft ; and 
putting j9 = -h ro), we then have from (4) and (2): 

(5) Va = fix'i -f- A, . 

(’on.sequently, by (3) 

Sx'y — Sx'aya — fiSx'x" -|- Sx'A = Sx'A; 

and this function has a normal distribution with zero mean and variance a*. 
If in the sample we work out a regression equation 

r = o -I- b'x' + h"x", 

the normal equations for determining b' and b" must by (3) take the simple forms 
a s= b' — Sx'y, b" = Sx"y. 
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From the general theory of least squares it is known that the sum of squares 
of residuals is 

= Siy - Yf = Si - ySy - {Sx'y^ - 

and that Sv^/cr^ has the x* distribution with n = N — 3 degrees of freedom, 
independently both of Sx'y and of Sx''y. From these facts it follows that 

(6) t = Sx'y^ ^ 

has the Student distribution with n degrees of freedom. Since in accordance 
with the foregoing definitions and (1) w'c have 


and since also it is known that 


where 


(6) may be written 

( 7 ) 


si = S(y - y)* r — ~i, 

1 — ri) 


D = 


1 ri r* 

r, 1 ro 

rs n 1 


< = (n - rt)^ 


'n_(l + ro) 
2D 


The probability of a greater value of 1 < I is givcni l)y tables of the Student 
distribution with n = N — 3. If this probability is sufSciently small (which 
conventionally means less than .05, oi' sometimes .01) we ha\ e a corresponding 
degree of confidence that the variate chosen because of a higher correlation in 
the sample has actually a higher correlation than the other in the population. 


6. The Selection of One Variate from Among Three or More. Suppose that 
we are to choose one of the variates xi , • • • , x, in order to predict y. (p < N — 1) 
We choose the one having highest correlation, and wonder how much confidence 
to place in this choice. We shall now determine the distribution of a function 
suitable for testing the hypothesis that there is no real difference' between any 
pair of the correlations of Xi , • • • , Xp with y. Again we shall assume the values 
of these predictors fixed, and look for the place of our particular sample among 
all samples having these values, with only y free to vary normally by chance. 

Let a,, = S(xi — Xi){Xj — x,), and let c,-,- be the cofactor of a,-, in the deter- 
minant a of these quantities, divided by o. Then 


1 a j = k, 
0 a j 9^ k. 


( 8 ) 


OftiCik — * 



SBLUCTION OF VABIATES 


279 


Here S stands for summation from 1 to p. Let 


( 9 ) 

( 10 ) 
( 11 ) 


2 Cii 

li ~ Si^Xi X^Vf 

I = XwiU, 


From (9) it follows that 

( 12 ) Xwi = 1 . 

From the hypothesis that y is in the population equally correlated with all the 
Xi it follows that /i , • • • , 4 have equal expectations, which we may denote by 
X; and from (11) and (12) it follows that also E(l) = X. Obviously 

(13) E{U - \){h - X) == A.y, 


where <r^ is the variance of those values of y corresponding to a fixed set of 
values of the x*s. From (11), (13) and (9) we obtain 


( 14 ) 




Since the h are linear fuiK’.tions of the y^s, they have the multivariate nonnal 
distribution. From the theory of this distribution and the values (13) of the 
covariances it follows that the distribution has the form 

dll -, dip , 

w'here a is the determinant of the a^y’s, and 

T = ZXciiili - X)(Zy ~ X). 

We may introduce linear functions , • • • , ij, of ?i — X, • • • , Zp — X such that 

T = l[-+ ... + l'^, and such that ip = (i - X)*S2:c</. Now 

has tlie X* distribution with p — 1 degrees of freedom. The numerator of this 
expression equals 

T - l'^ ^ XXCiiili - x)(iy - X) - (i - X)*S2:co 
= — PZSC.y 


= SSC.;(i< - l){li - 1). 


The penultimate form shows that this function is independent of X; the last, 
as a positive definite form in the deviations of the i's from their weighted mean, 
shows that sufficiently large values of the expression will reveal with dehnitencss 
the inequality of the predicting powers of the p variates when this exists. 
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It is well known that the regression coefficients of y upon the set of variates 
*i , • • • , X, are completely independent of the sum of squares Sti* of residuals 
from the regression equation. Since the f’s are linear functions of these regres- 
sion coefficients, (namely the linear functions appearing in the normal equa- 
tions), they also are independent of S»*. Hence, if we put 

i ISdililj - P22c<, 

*1 “ ~ , 
p- 1 



the ratio F = sj/s* will, in case of equality of the correlations of the vaiious 
x’s with y, have the variance ratio distribution with ni = p — 1 and nj = AT — 
p — 1 degrees of freedom. When p = 2 this test reduces exactly to (7), as it 
should, and F = t*. 

In the numerical application of this method, the regression coefficients 6,- 
of p on Xi , • • • , Xp should first be worked out by the invei’se matrix method. 
The right-hand memb«!rs of the normal equations are h , ■ • ■ ,lp , the coefficients 
in these equations are the o,-, , and the calculation of is simplified with the 
help of the identity 

22c</i<f,- = 2?)<f< . 


6. Selection of Additional Variates When Some Have Been Chosen. Sup- 
pose now that q predictors have been included definitely in the regression equa- 
tion, and that one more is to be selected for inclusion among p additional pre- 
dictors that are available. The criterion now is that that one should be chosen 
tentatively which has the highest partial correlation with the predictand, elimi- 
nating those already definitely chosen ; but the confidence to be placed in the 
choice is to be judged by an adaptation of the criterion of the preceding section. 
It is only necessary to consider the Oj, , f, , c», and f>, (i, j = 1, • • • , p) as cal- 
culated from the new predictors and the deviations of y from the regression 
equation on the predictors already adopted. Formulae may easily be derived 
for the values of these quantities in terms of those already found and the sums 
of products, so as to simplify the calculations. Sv* will now stand for the sum 
of squares of residuals from the regression equation involving all the p + q 
predictors. It is to be divided byJV — p — g — Ito obtain «* . The numbers 
of degrees of freedom with respect to which F is to be judged are now ni = p — 1 
and Ti% — N — p — q — 1. When p = 2 this test, like that of the preceding 
section, reduces to the use of the ^-distribution of (7), with n — N — q — 3, 
and the correlations standing for partial correlations eliminating the predictors 
already definitely chosen. 

A special instance in which this procedure is applicable is in economic time 
series, in which time, in the form of orthogonal polynomials, must ordinarily be 
“partialled out” in order that tests of significance may be sound. 
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7. Further Problems. It is natural to ask whether the foregoing work can be 
extended to examine the soundness of the selection, on the basis of a greater 
multiple correlation, of a particular set of two or more variates, chosen from 
among several such nets. The simplest such problem that goes beyond what 
has been done above deals with two sets, each of two predictors, having in a 
sample multiple correlations R and /?' with the predictand. The question is 
whether the difference /2 — is significant. 

Suppose that, in the interests of simplicity and the hope of attaining a solu- 
tion satisfactorily free from unknown parameters, we assume as before that the 
predictors have a fixed set of values, the same in all samples. Since multiple 
correlations arc invariant under linear transformations of predictors, we may 
without loss of generality assume that the predictors in each set are mutually 
uncorrelated and have sums of squares equal to unity. Indeed, we may go 
somewhat further in standardizing the sets of values to which consideration can 
be confined without loss of generality, with the help of some ideas introduced 
in the paper [1]. In the terminology of that paper, the variates in each set may 
be considered canonical with respect to the relationship between the sets. This 
means that linear functions Xy and of the two variates in one set, and linear 
functions x[ and x^ of those in the other set, can be chosen so as to satisfy not 
only the conditions 

Sxi = Sx 2 = Sx[ « SX 2 = 0 

(15) Sri - Sxl - Sxi^ = Sxi" « 1 
SXyXz = 0 = Sx{x2 , 

but also the further conditions 

(16) SxixJ = 0 = Sxix[ . 

This means that, for all the purposes in view, the two sets of predictors can be 
(‘haracterized as to their mutual relationships by the values of the remaining 
two sums of products, namely 

Cl = SxixJ , cj = Sxixl . 

In view of the conditions assumed earlier, ci and c* are what have been called 
the canonical correlations between the two sets. 

To the sets thus standardized, the predictand y is related in a manner expressed 
by the population regression coefficients ft and ft of y on the first set, and 
and jSj on the second. If we take y as having unit variance in the population, 
the squared multiple correlation coefficients in the two cases will be 

P* = ^? + /3|, = + 

The hypothesis to be tested is that p == p'. If hi,bt,b[, bi denote the sample 
estimates of the regression coefficients, the statistic appropriate for the test 
would appear necessarily to be proportional to 

w = i(bi + bl - bi* - bi*). 
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The sample regression coefficients are normally distributed^ with population 
correlations equal to the sample correlations among the corresponding predictors. 
The variance of each is Thus their joint distribution may be written down 
at once, in a rather simple form in view of (15) and (16). From this it is pos- 
sible to determine directly the characteristic function M{t) = of w. If 
we write K{1) = log M{t) we obtain: 

2K{t) = + fi-y + (/J? - - (] - 

- Slog {1 - (1 - 

Here the summations are with respect to j over the values 1 and 2. If each set 
of predictors had had .s members, the same result would hold for K{t) except 
that the summations with respect to j would th(»n extend from 1 to s. 

This is a very disappointing result because it contains so many parameters. 
The distribution of w must contain the sanies parameters as its characteristic 
function. All the four parameters /3y , appear in tlu^ (expression aliovii, though 
their effective number is reduced to three by the (condition that ihv two sums 
of squares shall be equal which constitut(?s the hypotlu^sis under test. The 
distribution of w thus contains at least three unknown j)arainet(*rs besides a. 

The estimate of variance obtained from the residuals from the grand re- 
gression eciuation of y on , 0*2 , , and 0*2 is independent of w. Its distribu- 

tion is of the usual form and involves a parameter, th(‘ population variances, 
which is a function of , ft , , and (^2 • W(^ <u)uld tlauefon' pass by a single 

integration from the distribution of tv t-o that of the statistic ‘w/h\ which vanishes 
with Wj and which on this account, and on grounds of j)hysical dimensionality, 
might be considered appropriate to t(\st tlu^ hyi)oth('sis that p = p'. The qu(\s- 
tion may be raised whether the distribution of this ratio might not Ik; free; from 
parameters. The answer unfortunately is in tin; n(*gativ(\ as ai)pears from an 
examination of the (^haracteristii; function of i\\v ratio. Even in the simplific^d 
case in which all the c, are equal, a troubl(;some par-am(‘t(‘r persists in tlie 
distribution. 

Thus we meet again the problem of nuisance paramet(U\s, and this time no 
escape is visible. Perhaps some such artifice as those enumerated in paragraph 
3 (for example, some further limitation of the sub-population within which we 
should seek the place of our particular sami)le) is capable of yielding an exact, 
or ^‘studentized^^ distrilmtion, but this has not yet been found. The problem 
is of considerable int(;resl, not onW because of its practical importance, but 
because of its suggestiveness in connection with general theory. 

Numerous other problems having both practical importance and general 
theoretical interest are associated with the selection of predictors. For example, 
we have not dealt at all with the problem of the number of predictors that 
should be used when maximum accuracy in prediction, or in evaluation of the 
regression coc'fficients, is thf' solo criterion. A particular case is the determina- 
tion of the degree of the regression xjolynomial which should be fitted to obtain 
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maximum accuracy, for example of the number of orthogonal polynomials in 
fitting a trend. Such customary criteria as minimizing the estimated variance 
of deviations, in which the sum of squares which is the numerator and the 
number of degrees of freedom which is the denominator both diminish to zero 
as the number of variates is increased, do not rest upon any satisfactory general 
theory. 

Another related set of problems is concerned with variates more numerous 
than the observations on each. It is clear that there is real information in- 
herent in data of this kind, but existing theory and methods, including those of 
the present paper, are not adequate to utilize it in a thoroughly efficient manner. 
A recent paper of P. L. Hsu [9] is unique in not excluding the case in which the 
variates outnumber the observations. 

8. Summary. A criterion has been obtained for judging the definiteness of 
the sek'etion of a particular variate, from among several available for prediction, 
on the basis of its having the maximum sample correlation with the predictand. 
A variation of this crit erion is applied in paragraph 6 to the problem of extending 
the list of variates to be used in a regression formula. 

Some of the problems of ^‘nuisance parameters'^ which affect general theory 
are illustrated in this problem. Some outstanding unsolved problems related 
to these questions are discussed in paragraph 7. 
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THE FITTING OF STRAIGHT LINES IF BOTH VARIABLES ARE 
SUBJECT TO ERROR 

By Abraham Wald 

1. Introduction. The problem of fitting straight lines if both variable's x 
and y are subject to error, has been treated by many authors. If we have N > 2 
observed points (x,- , 2/i) (i = 1, • • • , iV), the usually employed method of least 
squares for determining the coefiicients a, h, of the straight, line y ^ ax + h 
is that of choosing values of a and h which minimize the sum of the squares of 
the residuals of the y*s, i.e. 2(ax* + fc — yif is a minimum. It is well known 
that treating y as an independent variable and minimizing the sum of the 
squares of the residuals of the x^s, we get a different straight line as best fit. It 
has been pointed out^ that if both variables are subject to error there is no 
reason to prefer one of the regression lines described above to the other. For 
obtaining the ^‘best fit,^’ which is not necessarily equal to one of the two lines 
mentioned, new criteria have to be found. This problem was treated by R. J. 
Adcock as early as 1877.^ 

He defines the line of best fit as the one for which the sum of the squares of 
the normal deviates of the N observed points from the line becomes a minimum. 
(Another early attempt to solve this problem by minimizing the sum of squares 
of the normal deviates was made by Karl Pearson,®) 

Many objections can be raised against this method. First, there is no justifi- 
cation for minimizing the sum of the squares of the normal deviates, and not 
the deviations in some other direction. Second, the straight line obtained by 
that method is not invariant under transformation of the coordinate system. 
It is clear that a satisfactory method should give results which do not depend 
on the choice of a particular coordinate system. This point has been empha- 
sized by C. F. Roos. He gives^ a good summary of the different methods and 
then proposes a general formula for fitting lines (and planes in ease of more than 
two variables) which do not depend on the choice of the coordinate system. 

^ See for instance Henry Schultz’ ‘The Statistical Law of Demand,” Jour, of Political 
Economy^ Vol. 33, Dec. (1925). 

* Analyst^ Vol. IV, p. 183 and Vol. V, p. 53. 

* ‘‘On Lines and Planes of Closest Fit to Systems of Points in Space” Phil. Mag. 6th 
Ser. Vol. II (1901). 

* ‘‘A General Invariant Criterion of Fit for Lines and Planes where all Variates are 
Subject to Error,” Metron^ February 1937. See also Oppenheim and Roos Bulletin of the 
American Mathematical Society^ Vol. 34 (1928), pp. 140-141. 
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Roos’ fonnuls includes many previous solutions' as special cases. H. E. Jones* 
gives an interesting geometric interpretation of Roos’ general formula. 

It is a common feature of Roos’ general formula and of all other methods 
proposed in recent years that the fitted straight line cannot be determined 
without a priori assumptions (independent of the observations) regarding the 
weights of the errors in the variables x and y. That is to say, either the standard 
deviations of the errors in x and in y are involved (or at least their ratio is 
included) in the formula of the fitted straight line and there is no method given 
by which those standard deviations can be estimated by means of the observed 
values of x and y. 

R. Frisch' has developed a new general theory of linear regression analysis, 
when all variables are subject to error. His very interesting theory employs 
quite new methods and is not based on probability concepts. Also on the basis 
of Frisch’s discussion it seems that there is no way of determining the “true” 
regression without a priori assumptions about the disturbing intensities. 

T. Koopmans* combined Frisch’s regression theory with the classical one in 
a new general theory based on probability concepts. Also, according to hie 
theory, the regression line can be determined only if the ratio of the standard 
deviations of the errors is known. 

In a recent paper R. G. D. Allen* gives a new interesting method for deter- 
mining the fitted straight line in case of two variables x and y. Denoting by a, 
the standard deviation of the errors in x, by a, the standard deviation of the 
errors in y and by p the correlation coefficient between the errors in the two 
variables, Allen emphasizes (p. 194)* that the fitted line can be determined only 
if the values of two of the three quantities «r, , <r, , p are given a priori. 

Finally I should like to mention a paper by C. Eisenhart,“ which contains 
many interesting remarks related to the subject treated here. 

In the present paper I shall deal with the case of two variables x and y in 
which the errors are uncorrelated. It will be shown that under certain con- 
ditions: 

(1 ) The fitted straight line can be determined without making o priori assump- 
tions (independent of the observed values x and y) r^arding the standard 
deviations of the errors. 

(2) The standard deviation of the errors can be well estimated by means of 

* For instance also Corrado Gini^s method described in his paper, Interpolazione 

di una Retta Quando i Valori della Variable Independente sono Affecti da Errori Acciden- 
talis,’’ Meiron, Vol. I, No. 3 (1921), pp. 63-82. 

* **Some Geometrical Considerations in the General Theory of Fitting Lines and Planes,” 
MetroUf February 1937. 

^ Statiaiical Confluence Analysis by Means of Complete Regression Systems^ Oslo, 1934. 

' Linear Regression AiMlysis of Economic Time Series, Haarlem, 1937. 

* ”The Assumptions of Linear Regression,” Economica, May 1939. 

^*The interpretation of certain regression methods and their use in biological and 
industrial research,” Annals of Math. Slat., Vol. 10 (1939), pp. 162-186. 
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the observed values of x and y. The precision of the estimate increases with 
the number of the observations and would give the exact values if the number 
of observations were infinite. (See in this connection also condition Y in 
section 3.) 

2. Formulation of the Problem. Let us begin with a precise formulation of 
the problem. We consider two sets of random variables^ 

xi, ,xn; yi, yVn. 

Denote the expected value E{Xi) of Xi by X,- and the expected value E(yt) of 
yi by y,' (i = 1, ■ • • , Y). We shall call Xi the true value of Xi , F,- the true 
value of j/j ,Xi — Xi = €< the error in the «-th term of the x-set, and — F,- = 
the error in the i-th term of the y-aet. 

The following assumptions will be made: 

I. The random variables «i , • • • , «iv each have the same distribuMori and, they 
are uncorrelated, i.e. Eititj) — 0 for i ^ j. The variance of u is finite. 

II. The random variables ni , • ■ ■ , Vk fo^h have the same distribution and are 
uncorrelated, i.e. Einitii) — 0 for i ^ j. The variance of nt finite. 

III. I'he random variables t< and rii (i = ■ ■ ■ , N; j = 1, • • ■ , N) are un- 

correlated, i.e. E(tirii) — 0. 

IV. A single linear relation holds between the true values X and Y, that is to 
say Yi = aXi + {i = t, ,N). 

Denote by e a random variable having the same probability distribution as 
possessed by each of the random variables ti , • • • , , and by jj a random 

variable having the same distribution as jji , • • • , ns . 

The problem to be solved can be formulated as follows: 

We know only two sets of observations: Xi , ■ • ■ ,Xn \ y\ , • • ■ ,y'N , where x'i 
denotes the observed value of Xi and y'i denotes the observed value ol yi. We 
know neither the true values Xi , ,X^; Fi, ••• ,Yk, nor the coefficients 

a and j8 of the linear relation between them. We have to estimate by means 
of the observations x'l , ■ • ■ ,x'k ;yi , • • • ,ys , (1) the values of a and p, (2) the 
standard deviation a, of t, and (3) the standard deviation o-, of ij. 

Problems of this kind occur often in Economics, where we are dealing with 
time series. For example, denote by the price of a certain good G in the 
period U , and by j/,- the quantity of G demanded in U . In each time period U 
there exists a normal price Xi and a normal demand F< which would obtain if 
the influence of some accidental disturbances could be eliminated. If we have 
reason to assume that there exists between the normal price and the normal 
demand a linear relationship we have to deal with a problem of the kind d#> 
scribed above. 

In the following discussions we shall use the notations and yi also for th^ 


“ A random or stochastic variable is a real variable associated with a probability 
distribution. 
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observed values x< and y\ since it will be clear in which sense they are meant 
and no confusion can arise. 


3. Consistent Estimates of the Parameters a, 0, tr, , <r, . For the sake of 
simplicity we assume that N is even. We consider the expression 


( 1 ) 


Oi - 


(xi + • • • + (sJm+l + 






Oj = 


(yi + • • • + Vm) ~ (ym+l + • • • + yir) 

' 'N 


where m = N 12. As an estimate of a we shall use the expression 

( 2 ) a = ^ I/m) ~ (Vm+l •! ••• ’+ yif) 

dl (xi + • • • + Xmj — (Xm+1 + • • • + Xw) 

We make the assumption 
V. The limit inferior of 

! (Xl + . . . 4* Xm) — (A'm+l + Xn) I /XT _ 9 ' 

i N “ 


ad. inf. 


is positive. 

We shall prove that a is a consistent estimate of a, i.e. a converges stochas- 
tically to a with N —* oc , if the assumptions I-V hold. Denote the expected 
value of Oi by oi and the expected value of o* by a* . It is obvious that 


(3) 


Oi = 


(Z, + . . . -F- X„) - (Z„+i + ... +Xy) 
N 


(Fi -f . . . + Fm) - (F„+. 4- . . . -h Fjv) 

N 


On account of the condition TV we have 


(4) di = adi, or ~= a. 

Ol 

The variance of oi — dt is equal to <rl/N and the variance of oj — dj is equal 
to ol /N. Hence Oi and 02 converge stochastically towards di and dj respectively. 

From that and assumption V it follows that also ^ converges stochastically 

Oi 

towards ^ = a. The intercept 0 of the regression line will be estimated by 

di 

(5) 6 “ y — ox, where x = - and y • 

Denote by X the arithmetic mean oi Xi, ,Xn and by f the arithmetic 
mean of Fi , • • • , Fat . Since y converges stochastically towards i towards 
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X, and a towards a, b converges stuohasticaUy towards Y — otX. From condi- 
tion IV it follows that T — aX — 0. Hence 6 converges stochastically 
towards 

Let us introduce the following notations: 


St 






N 




sample standard deviation of the x-observations, 


Sy 




iVi- iY 


sample standard deviation of the j/-observations, 


8xy 



- x)(yi - y) 


sample covariance between the x-set and y-set. 


Sx , sy and sxr denote the same expressions of the true values A'l , • • • , ; 

Yi,--,Yj,. 

It is obvious that 


( 6 ) 


E(sl) = si + 


,JV - 1 




(7) 


E(8l) = 


N - 1 

"N" 


t 


( 8 ) EiSty) — Sit, 

where E(sl), and E{sty) denote the expected values of si , si , and Sty . 
Since F< = aX< -f- /S, we have 

(9) Sy = asx , 

( 10 ) Sxr = ocslc . 

From (8), (9) and (10) we get 


( 11 ) 

( 12 ) 

If we substitute in 
we get 


. E(S,y) 

Sx = , 

a 

= aEiSgy). 

(6) and (7) for si and si their values in (11) and (12), 


(13) cl = [jS(s|) - ^«-^])V/(iV - 1), 

(14) cl = [£(81) - oEistyM/iN - 1). 


I observe that the equations (6), (7) and (8) are essentially the same as those investi- 
gated by R. Frisch, Statistical Confluence Analysis pp. 51-52. Bee also Allen’s equations 
(4) l.c. p. 194. 
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Since 4 1 4 > converge stochastically towards their expected values and a 
converges stochastically towards a, the expresmons 

(16) [si - - 1) 

and 

(16) [si - oa^W/iN - 1) 

are consistent estimates of a\ and <t\ respectively. 


4. Confidence Interval for a. In this section, as well as in sections 5 and 6, 
only the assumptions I-IV are assumed to hold. In other words, all state- 
ments made in these sections are valid independently of Assumption V, except 
where the contrary is explicitly stated. 

Let us introduce the following notation: 


*1 + • • • + *111. 


+ • • • + Vm 


Xm+l • • • + Xk, 


^ _ Vm+l + • ' • + yjv 


m 


'Lixi- 2i)* + E (*y - *)* 

^ ■ ”■ N 


E ivi - yi)* + E ivi- ft)* 

(sir = i-1 — — 


a = 


E (*< - *i)(y< - ft) + E (®< - **)(»>• - ft) 

N 


Jti , Jt* , fi , ft, (ai)*, (sk)* and a'xr denote the same functions of the true 
values Xt,---,Xn, Yi, ,Yh. The expressions si, ai, and a^ are 
slightly different from the corresponding expressions a. , , and a^ . The 

reason for introducing these new expressions is that the distributions of a«, 

, and a^y are not independent of the slope a = - of the sample r^ression 

Ol 

line, but ai , 4 and a'yy are distributed independently from a (assuming that e 
and 1 } are normally distributed). The latter statement follows easily from the 

fact that according to (1) and (2) a » ^ and ai , 4 > ai« are distributed 

*1 — X| 

independently of , ft and ft . 
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In the same way as we derived (13) and (14), we get 
(130 ffj = |^2?(0' - - 2), 

(140 = [Eislf - aEUM/iN - 2). 

These formulae differ from the corresponding formulae (13) and (14) only in 
the denominator of the second factor, having there N — 2 instead of — 1. 
This is due to the fact that the estimates Sx , Sj, , Sgy are based on iV — 1 degrees 
of freedom whereas , Sy and arc based only on N — 2 degrees of freedom. 
From (13') and (14') we get the following estimates^* for a] and al : 

(17) [(si)* - ^]iV/(iV - 2), 

(18) [(si)* - a/jN/{N - 2). 

Hence we get as an estimate of aj + aV* the expression : 

= Ks'yf + - 2as:y]N/(N - 2) 

^ I £ [(l/t ~ oiXi) - {yi - aJi)]^ + £ [(yj - aXj) - (§2 - «J2)]* 

N ^2[ "■ N 


Now we shall show that 

( 20 ) 


(N - 2)s2 

"2 I 2 2 

at, + a at 


has the x^-distribution with JV — 2 degrees of freedom, provided that e and rj 
are normally distributed. In fact, 


{yi — aXi) — (§1 — aXi) = rji - ac, — (^i - a€i) (i = 1, • • • , m) 

and 


iVi ~ ot^j) “ (^2 ~ OfX2) = rjj - acy - (fl2 ~ ah) (j == ^ + 1 , • • • , AT), 
where 


11 

+ €m 

€m+l + • • • 
ej == 

+ 

, 

w 


m 


= 

^ m 

+ IJm 

^ _ rim+i + • • • 
fl 2 = 

m 

+ 

t 



Since the variance of rik — aen is equal to + a a] and since rjk — ae* is un- 
correlated with rji — «€/ (fc J) (A, Z = 1, • • • , A^), the expression (20) has the 
X*-distribution with AT — 2 degrees of freedom. 


An * ‘estimate” is usually a function of the observations not involving any unknown 
parameters. We designate here as estimates also some functions involving the parameter a. 
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Now we shall show that 

( 21 ) 


<r, + a O'! 


is normally distributed with zero mean and unit variance. In fact from the 
equations (l)-(4) it follows that 


/ \ , f|i 

ai(a — a) = 02 H 1 7 * i 

" \®i/ 


= 02 + 






Since the latter expression is normally distributed (provided that c and ly are 


normally distributed) with zero mean and variance 


z I 2 2 
(Tyf + a lit 


N 


, our statement 


about (21) is proved. 

Obviously (20) and (21) are independently distributed, hence \/N — 2 times 
the ratio of (21) to the square root of (20), namely, 


( 22 ) 


t = V N ai(a - g) _ oi(o - a)VN - 2 




has the Student distribution with N — 2 degrees of freedom. Denote by <o the 
critical value of I corresponding to a chosen probability level. The deviation 
of o from an assumed population value a is significant if 


! qi( a — a)y/N — 2 


The confidence interval for « can be obtained by solving the equation in a, 
(23) alia - af = [(s^)^ + aV.Y - 2a8;] . 


Now we shall show that if the relation 
(24) 


holds, the roots ai and a 2 are real and a is contained in the interior of the interval 
[aia 2 ]. From (19) it follows that 

(«')" + — 2a«iy > 0 

for all values of a. Hence, for a = a the left hand side of (23) is smaller than 
the right hand side. On account of (24) there exists a value a' > a and a 
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value a" < a such that the left hand side of (23) is greater than the right hand 
side for a = a' and a <= a". Hence one root must lie between a and a' and the 
other root between a" and a. This proves our statement. The relation (24) 
always holds for sufficiently large N if Assumption V is fulfilled. The confi- 
dence interval of a is the interval [ai , aj. For very small N (24) may not hold. 

Finally I should like to remark that no essentially better estimate of the 
variance of ij — o€ can be given than the expression s* in (19). In fact, we 
have 2N observations Xi , • • • , Xw ; yi , • • • , y*r . For the estimation of the 
variance of 17 — a« we must eliminate the unknowns Xi , • • • , Xjf and /3. (The 
imknowns Yi , ■ • • , Fw are determined by the relations F* = aX< -|- /3 and a is 
involved in the expression whose variance is to be determined.) Hence we have 
at most N — 1 degrees of freedom and the estimate in (19) is based on iV — 2 
degrees of freedom. 


6. Confidence ibiterval for /3 if a is Given. In this case the best estimate of 
is given by the expression: 


ha = y — cat where £ = and y — ^ "t" . 


We have 


where 


Hence, 

(25) 


ha — = {9 ~ 'Y) — aix — 2) =‘ II — at 


c 


«!+••• 

"AT 


+ ttr 


, and 4 


' + ’?Ar 

N 


VN (jba - d) 

V 2 , a “2 
+ a a. 


is normally distributed with zero mean and unit variance. It is obvious that 
the expressions ( 20 ) and (25) are independently distributed. Hence y/N -- 2 
times the ratio of (25) to the square root of (20), i.e. 

/ = ./V Z-9 ^ (fc. - P) ^ y/N^2 (ba - 0) __ 

^ VN- 28 Vial)* + «*(«:)* -" 2 a«; 


has the Student distribution with N — 2 degrees of freedom. Denoting by to 
the critical value of t according to the chosen probability level, the confidence 
interval for is given by the interval: 

* + aislf — 2aSw , i. ^ V (el)* -f a^isl)* — 2a8at, 
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6. Confidence Region for a and /S Jointly. In most practical cases we want to 
know confidence limits for a and $ jointly. A pair of values a, $ can be repre* 
sented in the plane by the point with the coordinates a, fi. A region R of this 
plane is called confidence region of the true point (a, fi) corresponding to the 
probability level P if the following two conditions are fulfilled. 

(1) The region JfB is a function of the observations xi , ■ ■ ■ ,Xn ;yi , • • • ,yN , 
i.e. it is uniquely determined by the observations. 

(2) Before performing the experiment the probability that we shall obtain 
observed values such that (a, p) will be contained in R, is exactly equal to P. 
P is usually chosen to be equal to .95 or .99. 

We have shown that the expressions (21) and (25), i.e. 

VN aiia - a) VN (5. - ff) 

V ff, + a V O', H- a ff, 

are normally distributed with zero mean and unit variance. Now we shall 
show that these two quantities are independently distributed. For this purpose 
we have only to show that x, jj, Oi and oj are independently distributed (oi and o* 
are defined in (1)), but since 

fli — E(ai) = («! — i*)/2 

oa - Eiot) - iih - ib)/2 

X — E{£) = * 

g - E(y) = 

we have only to show that «, if, ej — «» , — ife are independently distributed. 

We obviously have 

I 

It is evident that €i , h, ffi and ift are independently distributed. Hence, 
E[i(ii - «j)] = (Eif - Eel)/2 = 0 and also J?[if(ifi - if*)] = (Eiff - Eifl)/2 = 0. 
Since h — h , Vi — if* > and i and if are normally distributed, the independence 
of this set of variables is proved, and therefore also (21) and (25) are inde- 
pendently distributed. It is obvious that the expression (20) is distributed 
independently of (21) and (25). From this it follows that 

N -2 N[a!(a - a)^ + (ff - ax - /?)*] 

2 ‘ (N - 2)8* 

-2)[a;(o-a)* + (g- a£-fi)^] 

■ 2[(4)* + «’(«:)’ - 

has the F-distribution (analysis of variance distribution) with 2 and N — 2 
degrees of freedom. The F-^tribution is tabulated in Snedecor’s book: Caku- 
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lotion and Interpretation of Analysis of Variance, Collegiate Frees, Amee, Iowa, 
1934. The distribution of § log F = z is tabulated in R. A. Fisher’s book: 
Statistical Methods for Research Workers, Jiondon, 1936. Denote by Fo the 
critical value of F corresponding to the chosen probability level P. Then the 
confidence region R is the set of points (a, p) which satisfy the inequality 


(27) 


N — 2 al(a — a)* + (jl — ai — jS)* ^ 

2 ’ 


The boundary of the region is given by the equation 

(28) a?(o — af + (y — a£ — /3)* = “*(**)* ~ 2as^]. 

This is the equation of an ellipse. Hence the region R is the interior of the 
ellipse defined by the equation (28). If Assumption V holds, the length of the 
axes of the ellipse are of the order 1 / \/N, hence with increasing N the ellipse 
reduces to a point. 


7. The Grouping of the Observations. We have divided the observations in 
two equal groups Gi and Gj , Gi containing the first half {xi , yi), ■ ■ ■ , (x„ , y„) 
and Gs the second half {x^+i , j/m+i), • • • , (xat , yn) of the observations. All 
the formulas and statements of the previous sections remain exactly valid for 
any arbitrary subdivision of the observations in two equal groups, provided 
that the subdivision is defined independently of the errors ei , • • • , ; 

Vi , ■ ■ ■ , Vn • The question of which is the most advantageous grouping arisejs, 
i.e. for which grouping will a be the most efficient estimate of a (will lead to 
the shortest confidence interval for a). It is easy to sec that the greater j Oi 1 
the more efficient is the estimate o of a. The expression [ Oj | becomes a maxi- 
mum if we order the observations such that Xi < Xi < • • • < Xx . That is to 
say I Oi I becomes a maximum if we group the observations according to the 
following: 

Rule I. The point (xi , yi) belongs to the group Gi if the number of elements 
Xj {j 7 ^ i) of the series Xi , • • • ,Xn fox which x,- < a;,- is less than m = N/ 2 . The 
point (Xi , yi) belongs to Gt if the number of elements *,• (j 9 ^ i) for which Xf < Xi 
is greater than or equal to m. 

This grouping, however, depends on the observed values Xi , • • • ,Xn and is 
therefore in general not entirely independent of the errors ti , ■ ■ ■ , . Let us 

now consider the grouping according to the following: 

Rule II. The point (x< , yi) belongs to the group Gi if the nuvnber of elements 
Xj of the series Xi , • • ■ , Xm for which Xj < Xi (j 9 ^ i) is less than m. The 
point (xi , yi) belongs to Gtif the number of elements Xjfor which Xj < Xi {j ^ t) 
is equal to or greater than m. 
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The grouping according to Rule II is entirely independent of the errors 

, ■ • • , ", Vi , • • • I Vif - It is identical with the grouping according to Rule I 

in the following case: Denote by z the median of zi , • ■ • , Xy ; assume that e 
can take values only within the finite interval [— c, H-c] and that all the values 
xi , ■ ■ • ,xy fall outside the interval [x — c, x + cj. It is easy to see that in 
this case Xi < x (i = 1, ■ ■ ■ , N) holds if and only if Xi < X, where X denotes 
the median of Xi , • • • ,Xk. Hence the grouping according to Rule II is 
identical to that according to Rule I and therefore the grouping according to 
Rule I is independent of the errors , • • • , cw . In such cases we get the best 
estimate of a by grouping the observations according to Rule I. , Practically, 
we can use the grouping according to Rule I and regard it as independent of the 
errors «i , • • • , ; iji , • • • , ifv if there exists a positive value c for which the 

probability that { e | > c is negligibly small and the number of observations 
contained in [a; — c, a; + c] is also very small. 

Denote by o' the value of a which we obtain by grouping the observations 
according to Rule I and by o" the value of a if we group the observations 
according to Rule II. The value a" is in general unknown, since the values 
A"! , • ■ • , Xat are unknown, except in the special case considered above, when 
we have a" = o'. We will now show that an upper and a lower limit for o" 
can always be given. First, we have to determine a positive value c such that 
the probability that | e | > c is negligibly small. The value of c may often be 
determined before we make the observations having some a priori knowledge 
about the possible range of the errors. If this is not the case, we can estimate 
the value of c from the data. It is well known that if we have errors in both 
variables and fit a straight line by the method of least squares minimizing in 
the x-direction, the sum of the squared deviations divided by the number of 
degrees of freedom will overestimate . Hence, if e is normally distributed, 
we can consider the interval [— 3t>, 3t^] as the possible range of «, i.e. c = 3v, 
where denotes the sum of the squared residuals divided by the number of 
degrees of freedom. If the distribution of € is unknown, we shall have to take 
for c a somewhat larger value, for instance c = 5v. After having determined c, 
upper and lower limits for a" can be given as follows: we consider the system S 
of all possible groupinp satisfying the conditions: 

(1) If Xi < z — c the point (a:,- , y,) belongs to the group 0i . 

(2) If a:,- > X + c the point (x< , y,) belongs to the group . 

We calculate the value of a according to each grouping of the system /S and 
denote the minimum of these values by a*, and the maximum by a**. Since 
the grouping according to Rule II is contained in the system S, a* is a lower 
and a** an upper limit of a". 

Let y be a grouping contained in S and denote by the confidence interval 
for a which we obtain from formula (23) using the grouinng g. Denote further 
by 1 the smallest interval which contains the intervals 7, for all elements g 
of S. Then I contains also the confidence interval corresponding to the grouping 
according to Rule II. If we denote by P the chosen probability level (say 
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P = .95), then we can say: If we were to draw a sample consisting of N pairs 
of observations (xi , j/i), • • , tlie probability is greater than or equal 

to P that we shall obtain a system of observations such that the interval I will 
include the true slope a. 

The computing work for the determination of I may be considerable if the 
number of observations within the interval [a: — c, a: + c] is not small. We 
can get a good approximation to / by leas computation work as follows: First 
we calculate the slope a* using the grouping according to Rule I and determine 
the confidence interval [a' — 6 , a' + A] according to formula (23). Denote by 

a{g) the value of the slope, i.e. the value of ^ , corresponding to a grouping 

Xi ““ X2 

g of the system 5, and by [a{g) — dg , a{g) + A^,] the corresponding confidence 
interval calculated from (23). Neglecting the differences {5g — 6 ) and (A^ — A), 
we obtain for I the interval [a* — 6 , a** + A]. 

If the difference a** — a* is small, we can consider / = — 5, a** + A] as 

the correct confidence interval of a corresponding to the chosen probability 
level P, If, however, a** — a* is large, the interval I is unnecessarily large. 
In such cases we may get a much shorter confidence interval by using some 
other grouping defined independently of the errors €i , • • , €jsr ; t;i , » • • , 17 ^ . 
For instance if we see that the values Xi , • • * , Xy considered in the order as 
they have been observed, show a monotonically increasing (or decreasing) tend- 
ency, we shall define the group Gi as the first half, and the group (?2 as the 
second half of the observations. Though we decide to make this grouping after 
having observed that the values Xi , • • • ^ Xn show a clear trend, the grouping 
can be considered as independent of the errors € 1 , • • , cat . In fact, if the 
range of the error c is small in comparison to the true part X, the trend tendency 
of the value , • • • will not be affected by the size of the errors €i , • • • , . 

We may use for the grouping also any other property of the data which is 
independent of the errors. 

The results of the preceding considerations can be summarized as follows: 
We use first the grouping according to Rule I, calculate the slope a' = ^ 

Xx ■' Xft 

and the corresponding confidence interval [a' — 6, a' + A] (formula (23)). This 
confidence interval cannot be considered as exact since the grouping according 
to Rule I is not completely independent of the errors. In order to take account 
of this fact, we calculate a* and a**. If o** — a* is small, we consider I — 
[a* — 5 , o’"* + A] with practical approximation as the correct confidence interval. 
If, however, o** — o* is large, the interval I is unnecessarily large. We can 
only say that J is a confidence interval corresponding to a probability level 
greater than or equal to the chosen one. In such cases we should try to use 
some other grouping defined independently of the errors, which eventually will 
lead to a considerably shorter confidence interval. 

Analogous considerations hold regarding the joint confidence region for a 
and jS. We usp the grouping according to Rule I and calculate from (27) the 
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corresponding confidence region R, If | a** — a* | and | 6’*“*' — 6* 1 are small 
(h* = ^ — a*x and 6** = § — a**;r) we enlarge ft to a region ft corresponding 
to the fact that a and h may take any values within the intervals [a**, a*] and 
[b**j 6*] respectively. The region ft can be considered with practical approxi- 
mation as the correct confidence region. If | a** — a* | or | 6** — 6* | is large, 
we may try some other grouping defined independently of the errors, which 
may lead to a smaller confidence region. In any case ft represents a confidence 
region corresponding to a probability level greater than or equal to the 
chosen one. 

8. Some Remarks on the Consistency of the Estimates of a, 0, (r« , (t,, . We 
have shown in section 3 that the given estimates of a, at and cr,, are consistent 
if condition V is satisfied. 

If the values X\ j • • • , are not obtained by random sampling, it will in 
general be possible to define a grouping which is independent of the errors and 
for which condition V is satisfied. We can sometimes arrange the experiments 
such that no values of the series Xi ^ • • • , should be within the interval 
[x — c, X + cj where x denotes the median of Xi , • • • , Xat and c the range of 
the error €. In such cases, as we saw, the grouping according to Rule I is 
independent of the errors. Cotidition V is certainly satisfied if we group the 
data according to Rule 1. 

liCt us now consider the case that , • • • , are random variables inde- 
pendently distributed, each having the same distribution. Denote by X a 
random variable having the same probability distribution as possessed by each 
of the random variables Xi , • • , Xy , Assuming that X has a finite second 
moment, the expression in condition V will approach zero stochastically with 
X — ► 00 for any grouping defined independently of the values Xi , • • • , Xat . 
It is possible, however, to define a grouping independent of the errors (but not 
independent of Xi , • • • , Xiv) for which the expression in V does not approach 
zero, provided that X has the following property: There exists a real value X 
such that the probability that X will lie within the interval [X — c, X + c] 
(c denotes the range of the error «) is zero, the probability that X > X + c 
is positive, and the probability that -Y < X — c is positive. The grouping can 
l)e defined, for instance, as follows: 

The ?'-th obs(Tvation (x* , ?/,) belongs to the group (f] if x* < X and to if 
Xi > X. We continue the grouping according to this rule up to a value i for 
which one of the groups Gi , G 2 contains already N/2 elements. All further ob- 
servations belong to the other group. 

It is easy to see that the probability is equal to 1 that the relation x,- < X 
is equivalent to the relation X< < X — c and the relation x»- > X is equivalent to 
the relation X< > X + c. Hence this grouping is independent of the errors. 
Since for this grouping condition V is satisfied, our statement is proved. 

If X has not the property described above, it may happen that for every 
grouping defined independently of the errors, the expression in condition V con- 
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verges always to zero stochastically. Such a case arises for instance if AT, € and 
1 } are normally distributed.''* It can be shown that in this case no consistent 
estimates of the parameters a and can be given, unless we have some addi- 
tional information not contained in the data (for instance we know a priori the 
ratio 

9. Structural Relationship and Prediction.’^ The problem discussed in this 
paper was the question as to how to estimate the relationship between the true 
parts X and Y. We shall call the relationship between the true parts the struc- 
tural relationship. The problem of finding the structural relationship must not 
be confused with the problem of prediction of one variable by means of the 
other. The problem of prediction can be formulated as follows: We have ob- 
served N pairs of values (*1 , j/i), • • • , {xn , yn). A new observation on x is 
given and we have to estimate the corresponding value of y by means of our 
previous observations (xt , yi), • • • , (x^ , ys)- One might think that if we have 
estimated the structural relationship between X and Y, we may estimate y by 
the same relationship. That is to say, if the estimated structural relationship 
is given by F = aA + 6, we may estimate y from x by the same formula: 
y = ax b. This procedure may lead, however, to a biased estimate of y. 
This is, for instance, the case if X, t and are normally distributed. It can 
easily be shown in this case that for any given x the conditional expectation of 
p is a linear function of x, that the slope of this function is different from the 
slope of the structural relationship, and that among all unbiased estimates of 
y which are linear functions of x, the estimate obtained by the method of least 
squares has the smallest variance. Hence in this case we have to use the least 
square estimate for purposes of prediction. Even if we would know exactly the 
structural relationship Y — aX -f P, we would get a biased estimate of y by 
putting y = ax + 

Let us consider now the following example: X is a random variable having 
a rectangular distribution with the range [0, 1]. The random variable c has a 
rectangular distribution with the range (—0.1, -f 0.1]. For any given x let us 
denote the conditional expectation of y by E{y | x) and the conditional expecta- 
tion of X by E(X I x). Then we obviously have 

E{y I x) = aE(X \x) + 

Now let us calculate E{X \ x). It is obvious that the joint distribution of X and 
f is given by the density function: 

SdXdt, 


I wish to thank Professor Hotelling for drawing my attention to this case. 

’* I should like to express my thanks to Professor Hotelling for many interesting sug- 
gestions and remarks on this subject. 
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where X can take any value within the interval [0, 1] and c can tidce any value 
within [—0.1, + 0.1]. From this we obtain easily that the joint distribution of 
X and X is given by the density function 


5dxdX, 

where x can take any value within the interval [—0.1, 1.1] and X can take any 
value lying in both intervals [0, 1] and [a; — 0.1, x + 0.1] simultaneously. De- 
note by Ix the conunon part of these two intervals. Then for any fixed x the 
relative distribution of X is given by the probability density 


Hence, we have 


dX 


fix' 


f XdX 
E(Xlx) = ^ 


1 . 


dX 


We have to consider 3 cases: 


( 1 ) 

In this case Ix 


0.1 < a; < 0.9. 
[a: - 0.1, a: + 0.1] and 


EiX\x) 


L 


*+0.1 


XdX 





(2) -0.1 < a; < 0,1. Then = [0, ar + 0.1] and 

r*+o.i 


E{X\x) = 


r 

L 


XdX 


= .5a: •+- .05. 


dX 


(3) 0.9 < a: < 1.1. Then /, = [*- 0.1, 1] and 


f XdX 

E(X\x) = - .5® + .46. 

I dX 

Jx-O.t 
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Since 


E{y I :r) = aEiX \x) 

we see that the structural relationship gives an unbiased prediction of y from x 
if 0.1 < z < 0.9, but not in the other cases. 

The problem of cases for which the structural relationship is appropriate also 
for purposes of prediction, needs further investigation. I should like to mention 
a class of cases where the structural relationship has to be used also for prediction. 
Assume that we have observed N values {xi , yi), • • • , (xn , ys) of the variables 
X and y for which the conditions I IV of section 2 hold. Then we make a new 
observation on x obtaining the value x\ We assume that the last observation 
on X has been made under changed conditions such that we are sure that x' does 
not contain error, i.e. x' is equal to the true part X'. Such a situation may arise 
for instance if the error « is due to errors of measurement and the last observa- 
tion has been made with an instrument of great precision for which the error of 
measurement can be neglected. In such cases the prediction of the correspond- 
ing y' has to be made by means of the estimated structural relationship, i.e. we 
have to put y' = oz' + h. 

The knowledge of the structural relationship is essential for constructing any 
theory in the empirical sciences. The laws of the empirical sciences mostly 
express relationships among a limited number of variables which would prevail 
exactly if the disturbing influence of a great number of other variables could 
be eliminated. In our experiments we never succeed in eliminating completely 
these disturbances. Hence in deducing laws from observations, we have the 
task of estimating structural relationships. 

Columbia University, 

New York, N. Y. 



A METHOD FOR MINIMIZING THE SUM OF ABSOLUTE VALUES 

OF DEVIATIONS 


By Robert R. Singleton 

1. Introduction. In t he Philosophical Magazine^ 7th series, May 1930, E. C. 
Rhodes described a method of computation for the estimation of parameters 
l)y minimizing the sum of absolute values of deviations. His is an iterative 
and recursive method, in the following sense. There is a direct method for 
minimization with one parameter. Assuming a method for minimization with 
n — 1 parameters, Rhodes imposes a relation betw^een the n parameters (in an 
7i-paramcier problem) and finds a restricted minimum by the method for n — 1 
])aram(i(*rs. In t his staise his method is recursive. He then repeats the process, 
by imj)osiug on llu‘ n paranuiers a now" relation determined by the restricted 
minimum. In this siaise his method is iterative. The process is finite, ending 
\vh(‘n a restri(*ted minimum immediately succeeds itself, indicating a true 
minimum. 

Rhodes^ papc'r |)res(*nts the method without proof. The purpose of the 
pres(*nl pai)er is lo analyze the situation in detail sufficient to indicate proofs 
for various methods, and to present a new method wiiich reduces the labor of 
solution by eliminating the recursive feature. The iterative approach is re- 
tained. The solution of Rhodes' illustrative problem will be given for com- 
parison betw"een the tw"o methods. 

Th(‘ paper uses geometric, terminology and develops to quite an extent the 
geometry of a surface' representing the summed absolute deviations. This 
seems the clearest means of presenting the relationships. Further analysis of 
the properties of this surfa(!e should lead to an even more direct method for 
attaining the minimum than the one here presented. 

In the writing of the paper, no attention has been given to sets of observ^a- 
tions or (equations among wdiich a linear dependence may exist. In practice, 
such a situation almost nevxr occurs. If the need arises, the adjustments 
which must be made to take care of dependence are in each case fairly obvious. 

2. Geometric Analogue of Summed Absolute Deviations. Let n observa- 
tions on + 1 variates be represented by , y* where i = 1, * • • , n; a = 

1 , • * , >/. Unless otherwrise noted, latin indices have range 1 to n, greek indices, 
1 to V. The summation convention of tensor analysis is used. 

The variates are to be statistically related by the linear function^ 

v' = 

^ This includes the linear function with a constant, since a variate » 1 may be used. 

301 



302 


ROBERT R. smOLETON 


being an estimate of y\ u“ are to be determined so that v ^ — y* \ 

is a minimum. Set 

(1) « xttt" - y* 

and determine functions e'{u’‘) so that e'v' > 0, and | e* | 1 . It is immaterial 

that e is not uniquely determined when m“ satisfies v' = 0. Then v = Sj-cV’ 
is to b(! minimized. Using (1), 

(2) V = Xau“ — y 
where 

= Zie*‘x«, y = Xie*y\ 

Consider a Euclidean {v + l)-8pace, E,+i , with coordinates • • • , u', v. 
The coordinate hyperplane perpendicular to the D-axis will l>e called E, . In 
£,41 each of equations (1) for a particular i represents a v-plane which intersects 
E, in a (v — l)-plane when v' — 0. Elach of the equations 

^3) !)’■ = - y') 

represents two half-planes which touch E, and each other along the (r — 1)- 
plane given in E, by the (squation 

(4) xiw" - y* = 0. 

The functions on the right-hand side of (3) are thus continuous everywhere, 
and linear in any neighborhood of E, none of whose points satisfies (4). Since 
a sum of functions continuous and linear in a neighborhood is also continuous 
and linear in that neighborhood, it follows that the function on the right in (2) 
is continuous for all tt, and linear for every neighborhood of E, containing no 
points which satisfy (4) for any i. Hence 
Observation I: The surface (S) given in E,+i by (2) consists of portions of 
v-pUmes joined together. The projection of these joins on E, forms a network of 
(i» — l)-planes determined in E, by equations (4). 

3. Existence of a Mfaimiitw. Define a “bend of degree r on S" to be the 
locus of all points on S whose u-coordinates satisfy a set of r independent 
equations of (4). To each set of r independent equations corresponds a unique 
bend of degree r. 

If a linear relation = o“X' + b", o — , p < v, rank (o“) = p, is 

imposed on u", all the preceding development, reduced in dimension, applies 
to the new variates xi,o“ , y' — xib“. 

Observation II: A section of S by a plane of any dimension d < v has aU 
the properties of an S-sutfaee of dimension d. 

Since any set of consistent equations selected from (4) determines such a 
linear relation for u", the application of Observation I to any of the bends of S 
shows that each r-bend consists of linear elements of dimension v — r, joined 
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at points which lie on linear elements of lesser dimension. Thus £> is a poly- 
hedron. Its faces we term complexes of dimension v, C, , and the linear ele- 
ments of its edges which lie wholly in bends of degree r, but not of degree r -f 1 
are complexes C,_f of dimension v — r. The boundary of any C* , a > 0, 
consists of complexes of lesser dimension. The term complex is not restricted 
to either open or closed complexes. 

Since the function i>(«“) of (2) is non-negative, it possesses a greatest lower 
bound (g.l.b.) g. Since for some number h > g, there exists an N such that 
for ail I m“ 1 > N, »(«“) > h, it follows that for some closed neighborhood of E, 
the g.l.b. of V is g. Since v is continuous everywhere it attains its g.l.b., and 
so S has minimum points. Since the minimum of any complex not parallel 
to E, , lies on its boundary, and the boundary consists of complexes, it follows 
that the minimum points of S consist of Co’s and/or entire complexes of dimen- 
,sion > 0 which are parallel to E, . The next section will show that S has a 
unique minimum complex (including of course its boimdary complexes) and 
furthermore is cup-shaped. 


V 



4. Convexity Property; Uniqueness of the Minimum. Consider r = 1 in 
the preceding treatment (and for convenience not written). S looks generally 
like Fig. 1. The slope changes only where an equation of (4) has a root. Sup- 
pose the point is «« , and x'uo — y* = 0. From (3), since a* > 0, it follows 
that < 0 for u < mo , > 0 for u > . Since in (2) x = and 

since for h sufficiently small and ua — h<u<v^-\-h the only e to change 
value* is e*, we have that 

x(ut) + 2 I c*a:‘ | «= x(ut) 

where 


Mo — ^<Ui<Mo<«t<Mo + A. 

Hence the slope is a monotonic increasing step function. Since for u suffi- 
ciently small all c'x' < 0, and for u sufficiently large all e'x’ > 0, at some inter- 
mediate point or points either the slope is zero or it changes from negative to 

* The e’e corresponding to equations proportional to equation (1) also change value at xo. 
This does not destroy the argument. 
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positive without becoming zero. In the first case a single closed Ci is the 
minimum complex; in the second, a Co . In either case the curve given by (2) 
when V = 1 is concave upward and has just one minimum (iomplex, except for 
complexes of lesser dimension constituting the boundary of tliis complex. An 
obvious consequence is 

Lemma I. The set of points u for which v is less than some number N form a 
convex point set. 

This result is easily extended to the general dimension v. If for any two 
points Ui , 'iH of , v{ui) < N and v{ih) < Ny the plane in given by u" == 
Ui + X(i^ — Ui) makes a one-dimensional section of S. By Observation II, 
the points u lying on the projection of this section on Ep have the property of 
Lemma I and of course lie on the straight line joining U] and U 2 . "lliis is the 
property required for a convex point set. Hence 

Theorem I. The set of points a" of Ep for which as given by (2) is less 
than a fixed quantity form a convex point set. 

From this it follows immediately that there is a unique minimum complex. 
It is appropriate here to point out that no two complexes can b(* contaiiK'd in a 
single plane of the same dimension. This follows from the equation giving 
monotonicity of slope in one dimension, and Observation II. 

6. Gradient Directions. From here on the treatment will l)e of v as a function 
defined on Ep , and the equations will represent objects in Ep , unless otherwise 
stated. Complex and Bend also will refer to the projections on Ep of tlu^ com- 
plexes and bends of S. For a single-valued function defined on Ep the gradient 
at a point is the projection of a normal to t he surface r(q)r<'S(*nting th(' fuiH^tion 
in AVi . If the function is defined only over a siibspaee of Ep ])ossessing deriva- 
tives, the gradient will be required also to be tangent to the subspace. This is 
sufficient to determine a unique direction, and preserves the property that for an 
infinitesimal displacement in any direction the value of the function decreases 
most rapidly in the direction of the gradient. Here gradient is taken negative 
to its usual sense. 

A point u lying on a Cr but- not on a Cr i will have a gradient in Cr and also 
in each higher-dimensional complex on whose boundary Cr lies. If the gradient 
for w as a point of Cr-^k points into Cr^k (remembering that u lies on the boundary) 
this will be called a usable gradient. In the case of the grc'atest k for which 
there exists a usable gradient, there exists but one CV+t providing such a gradient, 
and that gradient is the ‘^best^’ gradient; that is, of all directions in Ep it pro- 
vides the direction of most rapid decrease of the function v. This follows from 
Theorem I. Furthermore, all complexes of lesser dimension providing usable 
gradients lie on the boundary of this Cr+it . In fact 

Theorem II. If for a point u on Cr y two complexes and C [ , « > r, lying 
in different bends of degree v — s but incident at Cr , both provide disable gradients 
for Uy then the complex on whose boundary lie both Cm and C[ also provides a 
usable gradient for u. 



MXNIMIZINQ SUM OF ABSOLUTE DEVIATIONS 


305 


This follows from Theorem I. Select ui on the gradient in C, , on the 
gradient in , for which v{ui) = v(ut). The join of Ui and lies in C.+i , 
and for some point, Us on this join, v(Uft) is less than v(wi) = Also, the 

distance im is less than at least one of mh , • Hence C«+i must contain a 

usable gradient. 


6. Selection of Best Gradient at Bends. The direction of the gradient for a 
point 1(0 considered as lying on a Cr is given by 

(5) Q — (no)^a« 

If Ko lies in the interior of a face, this is unique. If Uo lies in a bend, so that 
some e* are not determined, the " for each face is found by selecting the indeter- 
minate (i^s as +1 or —1, according to the face being considered. 

For a point wo considered as lying on a bend of degree r, given by r inde- 
pendent equations of (4): 

(6) *V-!/' = 0, (X = l, 

the gradient for a particular C»~r , determined by the conditions at the begin- 
ning of section 5, is 

(7) g"" = x\kx — Xa 


where k\ satisfies 

(m = 1, • • • , r) 


and Xa is as given in (2), the choice of sign for the indeterminate 
(X = 1, ..., r) being immaterial. They may, in fact, be taken as 0 in this 
instance. 

For a point iiS lying on an r-bend given by (6), to determine which complex 
contains the best gradient, each (r — l)-bend incident on the r-bend at Mo is 
tested for a usable gradient. Theorem II then determines the complex con- 
taining the best gradient. 

There are 2r such complexes incident at no , given by the r sets of equations 
selected from (6): 


( 8 ) 


(X): x: 


fM 


1 /" = 0 


(^=1, ...,x^l,x + l, ...,r) 
(X = 1, ... ,r). 


The two complexes lying in the same (r — l)-bend have the same equations in 
(8), but are distinguished later by c^(mo) for the omitted equation being taken 
first +1, then —1. 

The gradient for the Xth pair of complexes is 

0\ ~ Xakw ^ Xa 

similar to (7), but not identical. For = +1 in determining x« , we have 
gU , and for 6^ = —1, jv- . We restrict the consideration to - +1. 
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The line in the direction of greatest slope is then 

— Wo + 

Now Wo is here considered lying on the complex given by (8X) with 6^ = +1. 
In order that flfx+ point into this face, the deviation for the Xth observation 
must exceed 0 when t > 0; otherwise, for a displacement in the direction of gfvt- , 
changes sign immediately and the course is in the other complex. This 
deviation is 

= XaW** ■— = x\uS — = x\g\^L 

Had yxL been used, this deviation must be less than 0. Hence a necessary and 
suflScient condition that a complex given by (8) with either choice of e!" possess 
a usable gradient is 

(9) ” C [^aXaXal^ff ^aXaXa] ^ 0. 

For r = 1 the condition is given by (9) with the first sum merely omitted. 
4>x+ and <i>x~ cannot both exceed 0. 

When all sets of equations (8X) are tested by (9) the equations common to 
all sets possessing a usable gradient determine the complex with the best 
gradient, retaining the values of e for which (9) was satisfied. 

7. Property of the Minimum Point. For a minimum point, given by (6) 
with r — Vf all 4>x must be negative. Define = XaXaxl and = ^aX^aXa 
for convenience. Then in (9), the numbers , — 1 are seen from their defini- 

tion in (7) to be proportional to the cofactors of the Xth row of the matrix 
X^^), ii having the same range as X. Thus <l>x-i- = c Det X+), and 
^x~ = — c Det (X'^, X-), where in the first case X^^ is determined with e — +1, 
in the second with = —1. The factor of proportionality, c, must be the 
same since X*^ is unaffected by change of Now let X'* = where 

Xa = S*e X, , the range of k omitting the range of X. Then 

4>x+ = c (Det (X'“, X") + Det {X'", 

and 

«I>X- = -c [Det (X"', X") - Det (X", X*^)]. 

Hence 

«l>x+<l>x_ = -c* {[Det (X-", X")]® - [Det ex'", X-^)]*). 

Now let A represent the square matrix (x" ), a giving the rows and X the columns. 
Let Bx represent the matrix formed from A by replacing the Xth column by xt . 
Then 

4>x+4>x- = -c* [Det* (A'Bx) - Det* (A' A)] 

= -c* Det* A (Det* fix - Det* A) 
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and this will have the same sign as 

♦x - 1 Det (4)1-1 Det (Bx) 1- 

Since #x+ and 4>x- are never both positive, and at the minimum are both nega- 
tive for all at the minimum all 'fx > 0. To determine all Sf'x together, let, 
in matrix notation, z' = (*i , • • • , *,) and x*' — (x? , • • • , x*) where xZ were 
defined previously. Determine z as the solution of Az = x*. Then [ Det (Bx) | 
are equal to 1 zxH Det (4) 1- Hence a necessary and sufficient condition that 
^x > 0 for all X is that all 1 Zx 1 be less than one. Hence 
Theorem III : If a zero-complex is given by a set of equoAions whose matrix is M, 
a necessary and sufficient condition that the complex he a unique minimum is that 
the solutions of M'z — x* be all less than one in absolute value. If k of the solu- 
tions are equal to one in absolute value, and the rest are less than one, the minimum 
is a complex of dimension k with the zero-complex as one of its comers. 

The last statement follows since if one solution is 1 in absolute value, a 
corresponding — 0, and hence no gradient, usable or not, exists. Thus the 
corresponding complex is parallel to E, . 

8. Minimization for One Dimension. A method for minimization of (2) when 
there is just one parameter evolves from the monotonicity of slope in that case. 
Suppose the variates are w* and z’, and (1) is 

(10) V* = w't — z*. 

Suppose the variates are arranged in order of z*/w\ starting with the smallest. 
The slope of the rth segment (Fig. 1) from the left is 

2 1 I - Z) I w' I* 

<-I <-r+l 

The minimum occurs when the slope is 0 or changes from negative to positive; 
that is, when the first sum equals or exceeds the second; or when the first sum 
equals or exceeds half the total. This is a standard computation. If the 
change takes place when r = k, then t = z^/io* is the value of t giving the 
minimum. 

9. Mimimization Procedure for !> + 1 Dimensions. For any continuous func- 
tion with unique minimum and having the property of Theorem I, the following 
holds. Let uo be any point of E, . Let «i+i = «< -f- X,<< , where X< is any 
direction chosen at random and U is the value of t for which the function attains 
a minimum on the curve u + X(f. Then the probability is one that 

lim Ui =• Ui, where Ui is a minimum point for the function. If X< is taken 
<-♦00 

always as the gradient of u< , such a procedure is called the '‘method of steepest 
descent” for approaching the minimum point. 

Usually the limit is never attained. In this case, however, the minimum is 
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attained. The mmimum can be approached as closely as desired, hence a 
complex incident on the minimum is reached. But the convex point sets of 
Theorem I surrounding the minimum complex are all similar convex poly- 
hedrons in E, , whose corresponding faces are parallel, and the gradients at 
points on a bend cannot'point into a higher dimensional complex on the bend. 
Hence the sequence of points lie on bends of successively greater degree, and 
must eventually attain the minimum complex. 

TABLE II 
Points Uk 


?*+ > “ *** * 1 ' 9k 

Wo = (38, —5, —2) 

wi = (37.98202, -4.74828, -1.48467) 

wj = (37.45908, -2.07142, -1.85631) 

w, = (32.83333, -2.07142, -1.76191) 


TABLE III 


Computation of tk — Zk/Wk 


S 1 wt ! 

in order 
of col. 

exceeds 

att » 

hence tk » 

1 1 Wo 

(10) 

17521 

16 

.00599334 

S 1 Wi 1 

(15) 

2502 

2 

.0397792 

£ 1 Wj 1 

(20) 

4610 

10 

.00496545 


TABLE IV 


Gradients g“ for column {5k + 8) 


k 

gl 

Ok 

ol 

0 

-3 

42 

86 

1 

-13146 

67293 

-9345 

2 

-931588 

0 

19012 


The computational procedure is as follows: 

1 . Select a point u ^ . 

2. Determine the gradient gS from (5). 

3. Compute i«o = asiffo , *0 = y‘ — ®i«o' • 

4. Determine U by the method of section 8. 

5. Compute wf = ti# -f gSk . 

6. Determine the complex containing the best gradient by (9), and the 
gradient jif by (7). 

and so proceed to the minim um. This may be finally tested by Theorem III. 
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Step 5 is unnecessary, since the only use for u“ is to determine But 

e\ui) => e'(U), the latter referring to the computation in step 4. Also, after 
the first step, it is easier to compute 2 * by 

2»+i = 2* - ^Uk . 

10. Example. The computation for (9) is not so great as it would seem, since 
some of the work is duplication and some must be computed anyway for the 
gradient. Even so, for r > 3 it becomes, perhaps, more arduous than its 
contribution would seem to justify. For !< > 4 it is recommended that the 
test of (9) be omitted for points on bends of third degree or greater, and the 
final test of Theorem III be applied at the end of the work. If this test shows 
the minimum has not been reached, the complex in which lies the beat gradient 
will be indicated at the same time. 

The minimum number of steps is 0. The maximum number is tremendous 
but finite. The expected number is probably a little greater than r. 

In Tables I to IV, the method is applied to the problem used by Rhodes to 
illustrate his method. The independent variates are shown in columns (2), (3), 
(4), Table I, the dependent variate in column (5). The only other original 
datum is the initial point, selected by guess, shown in line 1, Table II. Since 
slightly different formulas were used in the computation, the signs of cols. 
(6)) (8)) (11)) (1®)) (18) are reversed, and the gradients in Table IV are 
multiplied by constants. As they are used only for directions, this does not 
matter. 

PeINCBTON tlNIVBEBITt, 

Peinchton, N. J. 



A STUDY OF A UNIVERSE OF n FINITE POPULATIONS WITH 
APPLICATION TO MOMENT-FUNCTION ADJUSTMENTS 
FOR GROUPED DATA 

By Joseph A. Pierce 

The object of this paper is to study the case of a universe of n finite popula- 
tions, considering both the expectations of population moment-functions and 
the moments of sample moments, and to make applications of the results which 
may be of interest to mathematical statisticians. The sampling formulas which 
are derived reduce to the usual infinite or finite sampling formulas, under 
appropriate assumptions. Also a method is given whereby finite sampling 
formulas may be transformed into the corresponding infinite sampling formulas. 

The general methods and formulas which are given in Part I for the expecta- 
tions of population moment-functions are used, in Part II, to find the expecta- 
tions of moments of a distribution of discrete data grouped in "k groupings 
of fc”. 


I. A Study of a Universe op n Finite Populations 

Ijct JJ n be a universe composed of the set of populations rX, (r = 1 , 2, • • • , n) 
each population rX consisting of a finite number of discrete variates rX{ , 
(i = 1, 2, • ■ ■ , N), (N > n). The <th moment of ,X is denoted by riit • The 
tth central moment of ,X is denoted by ,iit . The rth moment and the tth central 
moment of nUnr are respectively denoted by and . The expected value of a 
variable y is denoted by E(y). We have 

1 AT j W 

rUt — E{rXi) — ^ rXi , rMl ~ E(,tXi rMl) ~ ^ (i^» r^l) > 

1 n 2 « 

(1.1) “ X(,Tlit) = --£*#*«. M1:m« “ = - £ ri*( , 

^ ' 71 r—l 71 r-“l 

IVe also note that ^ may be written /im . . * • ’Mi j * 

1. The expected value of moments and centtal moments. It follows easily 
from (1.1) that 

( 1 . 2 ) 
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From the UHual formula for central moments in terms of moments, we get 
(1-3) Mi.m. = £ (-1)* • 

Terms of the form be evaluated by use of the well known formulas 

[20; p. 58] for changing from moments to central moments in the case of a multi- 
variate distribution. Two of these formulas are given below. 

(1.4) 

MOIUMmlUMeMlOOlllalltMe 

We find that 

i! 

where pipj is a two-part partition of t and ri -f r* = 1 . 

Using (1.3) and (1.5), we get 

(1.6) Mlsft - fin - /**:(■! • 

(1.7) Muf, = fit - 3fitUM,0t + 6/ilfit.^i + . 

(1.8) /tuf, = fit + 6(fit - 2/4) MJ:„, — 12/tifi3:0, + 12miM11:(.,(.s 

— 4jUu.„„„ -f- 0fin;^,^t ~ 3w:mi • 
etc. 

If the n populations are identical, it is evident from the definition of /tui, 
that, for all finite I, 

Muf, = Ml . 


2. The expected value of Thiele seminvariants. If the Uh Thiele sem- 
invariant is denoted by Xi , then 

/, ON (-l)'-‘tl(p-l)l 

ti.y; Mi:x, 2, «.1(2!)*«(3!)‘' . . ."(v!)*' - 

the summation being taken for all positive integers «,(* = 1, 2, • ■ • j;), for which 


P ™ f f — iSi . 

<»1 <-l 

Terms of the form m.,., • ■ ...idim • • are evaluated by (1.4). We have 
(1.10) Ml:X, = X» — Mi:„, . 
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(1.11) Ml:x, = X* — 3 mii:ihi», + 6Xiiiit:^i + 2iig:m . 

(1.12) /il:X 4 = X 4 + 12[X2 — 2 X 1 ] ikini — 24Xi/23;^j + 24Xi/iu:^j^j 

etc. 

If the n populations are identical then, for all finite 

Ml:X« = X< . 

3. Generalized sampling. It follows from definition that all rational isobaric 
moment-functions have the property that they may be expressed in terms of 
power sums and power product sums with certain coeflBcients. Of the power 
sums and power product sums which enter a sampling formula only the power 
product sums take different forms depending on the law of variate selection. 
Now, there are two possible courses which may be followed by one who wishes to 
derive sampling formulas for the case of a single population. 

1 . One may decide in advance on the law which he wishes to govern the 
selection of variates which enter the sample. Then he ma}’^ a.pply this law in 
the evaluation, in terms of moments, of every power product term as it occurs 
in each formula which is derived. 

2. One may derive the formulas for sampling under the condition that the 
law is unspecified, thereby obtaining formulas which are capable of being 
int-erpreted in terms of laws that are decided upon later. 

We illnst /rat(i the two possible courses by considering the formula, 

(1.13) fiv., = " 2* + 

which Carver [12; p. 102) obtains for the case of finite sampling without replace- 
ments. Here r = the number in the sample, s = the number in the parent 
population and = the algegraic sum of the variates of fth sample. Later, 
by evaluating Si* and Zxiij in terms of moments, he finds 

n iA\ - r(8-r)_ 

(1.14) fit:, ~ /«»:.• 

8—1 

(It should be not«d that Carver (12; p. 116] obtained the corresponding formula 
for infinite sampling by letting s — 

The preceding development is entirely in accord with the first of the courses 
stated above. It is also the standard procedure and is the course followed by 
such writers as Isserles [2], Neyman [6], Church [7], Pepper [11] and Dwyer [20], 
in deriving finite sampling formulas. Also, it is the course followed by such 
authors as “Student” [1], Tchouproff [3], Church [6], Craig [9], Fisher [10], and 
Georgesque [13] for the case of sampling from an infinite population. 
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However, in (1.13), it is possible to employ the definition. 


- MU. 

Then (1.14) becomes 

(1.15) fit-., = rfit + r(r - 1)/Si,i . 

Formula (1.15) may be interpreted as holding for either finite or infinite 
samplmg, depending on the interpretation which is given to iii,j . It may be 

easily shown that, if the sampling is from a limited supply, At and 

S i 

(1 .15) reduces to (1.14). If the sampling is from an infinite supply, /ii,i becomes 
Ml and therefore 

M*:. = rw:, , 

which is the formula [12; p. 115] that corresponds, in the infinite case, to (1.14). 

Thus, either of the two courses is possible in the case of sampling from a single 
population. However, if one wishes to get general formulas which hold for both 
infinite and finite sampling, he should follow the second course. Similarly, in 
order to obtain generalized sampling formulas where the relations between the 
variates are unspecified and the populations arc assumed to be different, the 
second course should be followed. 

It appears that Tchouproff [3], [4] was the first to approach the sampling 
problem from such a general point of view. However, his methods of derivation 
arc quite complicated and his results, in general, are difficult to apply to a given 
problem [5], [8]. 

Samples of n are foitned from nU n by chosing one variate from each of the n 
populations. A typical sample is 


lX,‘j , aX»j f 8X», , * • • , r^ij. > • • • > nXi^ , 

We define [4 ; p. 472] 


(1.16) 


r . . ■ • • ,„Xi’^) 


' • *r 






n 


(») 






— I 


where k represents the number of possible terms of the given form; S, means v 
times the sum for unequal values of n , r* • • . r, and = n(n — 1) • • • 
(n - V + 1). 


4. Moments and product moments of sample moments. The fth moment of 
the jth sample is denoted by . The «th moment of ,«», for all j is denoted by 
Vemt where the prime indicates that the moments of the universe are measured 
about a fixed point. 1 1 follows that 



VmiTi: POPULATIONS 


315 


1 

(1.17) aad Wi • 

fl r*«l 

Also, the general product moment, in which the variates of both the sample 
and the universe are measured about a fixed point, is defined by 

(1.18) ~ ••• 

As an illustration of the methods used to derive the formulas of this section, 
consider a special case of (1.18) when «i = 2 and «< = 0, (t * 2, 3, • • ■ , p). Then 



Therefore, by (1.1), (1.2) and (1.16), we get 

(1.19) V*!m, = 

Using the formulas [20; p. 34] relating products of power sums and power 
products to expand expressions of the type E{jtn‘i\ /»»i| • • • ,-m|j), we give, in the 
tables below, formulas for moments and product moments of sample moments 
through weight six. The number in a cell and the coefficient, in the same 
column, at the top of the table should be taken as the coefficient of the moment 
which is found in the same vertical division. The .coefficients in the vertical 
division are coefficients of the entire right members of the formulas for the 
respective moments. 

Terms of the form H h = h = • ■ ■ = <r = 1, are sometimes written 

. 

The numbers in the cells of the tables are identical with the numbers in the 
cells of the tables given by Dwyer [19; p. 30] for the expected value of partition 
products. 

6. Moments of centml moments of samples of n. The (th central moment of 
the ith sample is denoted by jMi . Then, 

( 1 . 20 ) jifit {,Xi, - itniY 

n r«l 

and 

(1.21) Ws, “ Jsfi 2 (r*<, - l«»l)‘l . 

r-l J 
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TABLE 1 


(1) 

(2) 


(3) 




Coef. 

Coef. 

n»> 

Coef. 

n 

n<*> 

n(») 

Ml 

Ml 

Ml* 


M. 

Mt.l 

Ml* 


'Mi:ini 

*Mi:»n 

n““i 

1 

Vi:mi 

n-i 

1 i 




'M2:mi 

n"* 

1 

Ml l^mims 

n“* 

1 

• 1 






Vi:mi 

n-* 

1 

3 

1 



Coef. 

n 

nta> 

nC*) 

„(•) 

n(») 

n(*) 

n“> 









M6 

M4.1 

MS.I 

Ml.l* 

Mi*.i 

M2.l» 

M1‘ 


(4 





'Mi:m» 

n-* 

1 







Coef. 

n 

n<*> 

.^(2) 

»U) 


Mlllmim4 

n“* 

1 

1 







M4 

Ms.i 

M2. 2 

M2.1* 

Mi< 

MlKmjmi 

n * 

1 


1 




Vi:»rt4 

/r» 

1 





VsKmimi 

n"* 

1 

2 

1 

1 



Mll^niimj 


1 

1 




VlCmimj 

n“» 

1 

1 

2 


1 


ViJmj 

n * 

1 


1 



f.. 

MSlImimj 


1 

3 

4 

3 

3 

1 ; 

V2i:mi»n* 

n"* 

1 i 

, _ 1 

i 2 

1 i 

t 1 



Vs<mi 

n-* 

1 

5 

10 

10 

15 

10 { 

1 'M4:m, 

n"* 

1 


3 

1 

6 

1 

1 
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After writing (rX.-, — j-mi)* as the sum of the general term of a binomial series 
and then expanding the resulting right member of (1.21) as a product of power 
sums [20; p. 19], we get 


(1.22) 






s! 

rilril • • • r,!iri!ir*! • • • 


z 




f^iv 




Mrirj-frp.m/-, 


”*<1* * 


where == P iri , irj , • • • are the numbers of the repeated 

;-l y-l 

parts of 8. 

The mean of th(^ <th central moment takes the following simple form, 

(1.23) = L(-iy(‘)w».-i«., 


where the moments in the right member of (1.23) through weight six are given 
in the tables of section four. Also, 


(1.24) PS:m2 ~ M2:m2 2^/4|l:,nim2 “t” M4:mi * 

(1.25) P8:m2 ^ M3:m2 *4“ 3 • 

~ M3:m| “t" 9 M82:m|fn2 “1“ 4 6 

"I" 4 MU:mim| 12 M4l:m,m2 • 


After substituting from the tables of section four, (1.23) through (1.26) become 

(1.27) 'flumt *= - Mul- 

(1.28) Vi:«, = ";[«- 3w.i + 2mi.]. 

71** 


= -Jn"’(n’ - 3n + 3)(w - Wi) + 3n‘»(2n - 3)/*,., 
n* 

+ 3n‘*(2/ui,i« — Ml*)]* 

Wi = [n®(n’ - 2n + 2)(,m - + 10n®(n - 2 )mi.i . 

Tp 

+ 10n®(n + l)(n — 4 )/ui.ij - 30n‘*^(n — 2)ih*,i 
— 10TC^*^(3n ~ 4)jBi,ii + 4n^*Vi‘l' 


(1.30) 
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Vi:A. = 4 - 5n’ + lOn* - lOn + 6)(^ - mi.i) 

ft 

+ 15n‘*(n’ - 4n* + 7n - 5W» - 10n‘*(2n* - 6» + 5)Kii 

(1.31) + 15n‘*(n’ - 4n* + 6n - 5)in,i* - 60n‘”(n* - 4n + 5 )miai 

+ 15n^**(3tt — — 3n + 5)^,1! 

+ 45n‘^(2n — + 15n‘“(n — — 5»'*Vi»l. 

(1.32) '/»*:*» = -^ [»'**(« — 1)(m 4 ~ 4/i8.i) + n*'’(n + l)in,t — «**^(2mj,ii — mh))- 

ft* 

'lir.mx = — 1)*(m« ~ 6fu.i) + 3n**^(« — l)(n* — 2n + S)m,t 

fir 

- 2n"’(3n* - 6n + 5 )m 3.» + n‘"(n* - 3n* + 9n - 15)/*,. 

(1.33) — Zn^'\n - l)(n - S)/*.,]. — 12V“(n* - 4n + 6)/i,,,.i 
+ 4n^‘’(3n — S)/.,,.. — 3n‘^’(n* — 6n + 15)/i,».i. 

+ n^*'(3/i,,i4 — /ti*)]. 

'/**=«. = -,ln®(n - Ifin - 2)(,*, - 6/*..,) - 3n‘«(n - 2)*(2n - 5 )/m., 
+ ft^^\ft 2)*(w* — 2n + 10)/i3,8 

(1.34) - 6n‘’>(n - 2)(n* - 6n + 20)/*,., .i + 3n‘"(n - 2)(7n - 10)/*4.i. 
+ 3n‘*'(3n* - 12n + 20)/.,. + 4n“’(n - 2)(n - lO)/*,.,. 

+ 9n^^(n* — 8n + 20)/.,.4t — 4n^*’(3/.M« ~ Mi*)!- 

6. The variance of tiie variance of sanities of n. The variance of the variance 
of samples of n, when the moments of the universe are measured about a fixed 
point, is defined as 

(1.35) fiiimj “ f^timx [ ■ 

Therefore, from (1.27) and (1.32), 

'w:*, = 4l»®(« — 1 )(m 4 — 4/to,i) 4- n^*’(n - l)/a,,, — n‘‘’(2/.,,i, - /ii«)] 
n* 

(1.36) 

■ (^) 

Tchouproff [4; p. 492] gave a formula (8) for the variance of the sample 
variance but his result is unwieldy due to the fact that moments of the universe 
are measured about the mean. 
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7. Convwitioml infinite Min^faig fonauhui derired from gananUfaed wnfliiig 
foimalat. The term “infinite sampling" is to be interpreted as meaning: 
mmpling from an unlimited sujyply or sampling from a limited supply wUh repeti- 
tions permitted. In each of these situations the variates are independent [5 ; p. 79]. 

First, it is assured that the n populations are identical, that is, tX » tX » • • • 
» nX. This assumption results in the fact that, for a fixed f, * ■ * ■ 

nfit and iji( » iM( « • • • » nfit • Therefore, under the assumption of identical 
populations, every moment may be interpreted as either the moment of n identi- 
cal populations or as the moment of a single population. The only other as- 
sumption is that the sampling is “infinite". 

From the condition of independence [3 ; p. 141], we have 




Therefore, 


riff ••• r,M«, • 


(combining the condition of independence with that of identical populations, we 
have 

(1.37) j St rirf-rjltitf-t, — St riM(| * • * r,Pt, — • 


By (1.16) and (1.37), we may write 

(1-38) . . = Ml, Ml, • • • Ml, • 

Since the only terms of the generalized sampling formulas which are affected 
by the assumption of “infinite sampling" are those of the form mi,i, - i. , the 
problem of obtaining conventional infinite sampling formulas from generalized 
sampling formulas is, in practice, a mechanical one. Simply write terms of the 
form Mill, •I, which appear in a generalized sampling formula, as moMi, - - * Mi, 
and one automatically obtains the corresponding infinite sampling formula. 

As an illustration of the method, consider the generalized sampling formula 
( 1 .36) for the variance of the sample variance. When ( 1 .38) is utilized to change 
it into the corresponding infinite sampling formula, (1.36) becomes 

(1.39) 'Mt:M, = --7- ((n — 1)(m« ~ 4^i,#ii) — (n — 3)^ + 2(2n — 3)(2mimi *“ Mi)], 

fr 

which is the usual formula (20; p. 75] for the variance of the sample variance 
when the moments of the imiverse are measured about a. fixed point. If it is 
assumed that the moments of nUn are measured about the mean, formula (1.39) 
becomes 

(1.40) ihimt ■= — [(n - 1)m4 - (n - 3)mJ], 

fir 

which was published by “Student" [1 ; p. 3] in 1908. 
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8. Conventional finite sampling fonnulas derived from generalized MmpHng 
formulas. The t«nn “finite sampling” is to Ik; interpreted as meaning : sampling 
from a limited supply when repetitions are not permitted. 

In order to reduce generalized sampling formulas to the corresponding formulas 
for finite sampling, the assumptions are made that the n populations are identical 
and that N and n are finite, N > n. The selection of variates which enter each 
sample is restricted in the following manner. If a variate having a given post- 
subscript is chosen, then no other variate having the same post-subscript may be 
chosen for the same sample. 

Now it is evident that terms of the form ••••(, must be redefined on the 
basis of the preceding assumptions. From the expansions [20; p. 32) of power 
product sums in terms of products of power sums, we get the formulas for 
which are given in the following tables. 

The formulas in the tables of this section arc called transfirrmation formulas for 
finite sampling or more briefly transformation formulas. 


The transformation of generalized sampling formulas into corresponding 

_:j. i; e i__ :ii i j._j i... N/lj , 


finite sampling formulas is illustrated by the substitution of 


for > 11.1 


in (1.27). We get 



which is the well-known finite sampling formula for the mean of the variance of 
samples of n. 

From this and the preceding section it is evident that the generalized sampling 
formulas may be considered as formulas for either infinite or finite sampling 
depending upon the interpretation given to terms of the form m«i <i • • • <, . 


9. Transformation of infinite sampling formulas into corresponding finite 
Bflmpling formulas. It is a well-known fact that infinite sampling formulas may 
be obtained from those for finite sampling by letting the size of the parent popula- 
tion become infinite. But, prior to this paper, apparently no one has presented a 
method of obtaining finite sampling formulas from infinite sampling formulas. 
However, by making use of the relations between finite, infinite, and generalized 
sampling, we shall demonstrate that it is possible to transform any infinite 
sampling formula into the corresponding finite sampling formula. 

Since the infinite sampling formulas are obtained from the generalized sam- 
pling formulas by replacing 

by •••/*<, 

it follows that generalized sampling formulas may be obtained from the infinite 
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formulas by replacing 

(1-42) hy 

However, it must be emphasized that the application of (1.42) demands formulae 
which are expressed in terms of moments of sample moments rather than central 
moments of sample moments (although the sample moments may be measured 
about a fixed point or about the mean) and the moments of the universe must be 
measured about a fixed point The reason for these restrictions is to insure that 
each term is accounted for individually. 

After replacements (1.42) are made in the formula for sampling from an 
infinite population, the resulting formula is the corresponding generalized one. 
The step to the corresponding finite sampling formulas is simply the one outlined 
in section eight, namely, the use of the transformation formulas. 

We shall consider, as the first illustration, the infinite sampling formula for 
the mean of the sample variance when the moments of the parent population are 
measured about the mean. The formula is 

(1.43) Ml:m2 = ^ M2. 


When (1.43) is expressed in terms of moments of the parent population about a 
fixed point, we have 

(1.44) - M?]. 


Following (1.42), is replaced by ni,\ and (1.44) becomes (1.27). The use of 
the transformation formula for #u,i gives (1.41) which, when the moments of the 
parent population are measured about the mean, becomes 

(1.45) W:.,. 


Infinite sampling formulas expressed in terms of moment-function, may be 
similarly transformed into the corresponding finite sampling formulas. For 
example, Craig [9; p. 57] gives the second Thiele semin variant of the variance 
of samples as 


(1.46) 


^2lin2 


(n ~ D* 


X, + 2 


’>X?, 


First, we express (1.46) in terms of moments about a fixed point by use of the 
formulas relating Thiele seminvariants and momeni^s [9; p. 12], We also recall 
that the resulting formula should be expressed in terms of moments of sample 
moments rather than in terms of central moments of sample moments. We 
obtain 


(1.47) 


Wmj = ■” 1)m4 ““ 4(n — 1)m«mi + — 2n + 3 )m2 

- 2(n - 2)(n - 3)MtMi + (n - 2)(n - 3 )mi1. 
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The next step is to trausform (1.47) into the corresponding generalized sampling 
formula by use of (1.42). We obtain (1.32). Since we desire to obtain the 
finite sampling formula which exactly corresponds to (1.46), it is necessary to 
transform (1.32) from the second moment of ^ to the variance of and we get 
(1.36). Next the transformation formulas are applied to (1.36). When the mo- 
ments of the parent population are measured about the mean and are replaced 
by Thiele seminvariants, (1.36) becomes 


(1.48) 




W - n)(n - 1) 


n*(Ar - 1)*(JV 


2)(W-3) 

+ 2(Ar*n 


[(N - l){Nn -N-n-l)\A 
3iVn - SAT -f 3n + 3)x|]. 


Formula (1 .48) gives the second Thiele seminvariant of the variance of samples of 
n drawn from a finite parent population of JV. When iV — > w , in (1.48), we 
obtain immediately (1.46). 

It is generally true that infinite sampling formulas are more easily derived than 
are the corresponding finite sampling formulas. The methods of this section 
make it possible to derive the desired sampling formulas for the infinite parent 
population and then transform these infinite sampling formulas into the corre- 
sponding finite sampling formulas. 


II. Moment Function Adjustments fob Grouped Data 


A given distribution of discrete variates may be grouped in “A: groupings of k". 
We desire to find the correction which eliminates the error made in replacing a 
given moment of the original distribution by the average of the corresponding 
moments of the k grouped-distributions. 

Formulas for the adjustments for moments of a grouped-distribution of 
discrete variates were first given (without proof) in the'Editorial of Vol. I, No. 1 
of the Annals of Mathematical Statistics, Later, more satisfactory derivations 
of adjustment formulas were given by Abemethy [24] Craig [25] and Carver [26]. 
However, it was observed by Carver [26; p. 162] that the developments of 
Abemethy and Craig are adjustments about a fixed point and that they fail to 
hold for the case of expectations of central moments if we accept the definition 




« = 2, 3, ...)• 


Here rju* represents the tth central moment of the rth grouped-distribution. The 
formula for the true value of wtii was supplied by Carver [26; p. 162] but he did 
not indicate a general method which might be used for the derivation of i 
(t > 2). 

A distribution of discrete variates grouped in ”k groupings of k” is a Special 
case of a universe of n finite populations and hence the methods and formulas 
for the expectations of population moments are applicable to oUr present 
problem. 
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It is found that the adjustment formulas for moment-functions of grouped 
data involve central moments of a rectangular distribution. It will be con- 
venient for our present purposes to give a brief treatment of the moment-func- 
tions of a rectangular distribution. 


1. Moment-ftmctkms of a rectangular distribution, ('onsider the rectangular 
distribution of discrete variates, 

(2.1) h, 2h, 3h, ■ ■ ■ , kh. 

It is readily shown that the moment generating function of (2,1), 

( 2 . 2 ) Otifi) — iM) ^ ^ ' * * 


may be written 
(2.3) 


QM = 


le sinh \hB 


Setting the expansion of the right member of (2.3) equal to the right member of 
(2.2) and equating coefficients of like powers of 0, we obtain the following recur- 
sion formula for the moments of (1.1) 


(n + 1)"’ .. (n + D® 

fin'.R *T" 


(2.4) 


11 


21 


+ (- 1 ) 


r— 1 


(n -|- 1) 

>1 


(r> 


K fin-T+V.lt 4" • * • — k” h" , 


where Mihb represents the nth moment of a rectangular distribution. Formulas 
for ju„:B , (n = 0, 1, • • • ,10) arc given below. See Sasuly [27 ; p. 27]. 


W):« = 1. 


Mi:« = Uk + l)h. 

= iik + l)(2i: + l)/i* = i(2fc + \)h^uH . 

= \(k + 1)* kh* = kh . 

M4:« = i(3fc’ + 3k- l)h* #X*:« . 

(^•®) in-.K = i(2fc* + 2A: + 1)A* mi:k • 

= ^Zk* + 6k* - 3k + l)h* M*:* . 

W:* = i(3*' + &k* -k* -ik + 2)h* n»:« • 

a<«:b = tV(5*;‘ + 15fc‘ + 5k* - m* - k* + 9k- 3)h* Mt:« . 

w:« = H2fc* + 6k* 4- k* - Sk* -t- A:’ -b 6* - 3)h* m>:« • 

Mio:« = T»T(3Jfc* 4 121t^ + Sk* -]Sk*- lOifc^ + 24fc’ + 2k* - 16fc + 5)h* . 
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The deviations about the mean of (2.1) are 

(2.6) -i(fc - l)h, -i(fc - Z)h, • • • , i(fc - 3)fc, Uk - i)h. 
Therefore, 

(2.7) = 0. 

If we denote (2.6) by x, we have 

e-s «■<*> = 

The lecursion formula for central moments of (2.1) is 


(2n + 1)‘ 


I Ji/ _ I '*■ I */ - I 

jpl I 22 * J 1 

A' (2n + l)*'^** . , _ 

'^2' (r+1)! • 

Formula.s for , (w = 0, 1, • • • , 5) are given below. See [27 ; p. 27]. 

Mo:k = 1, 

fir.R — ~ IM*) 

fU:H — — 7)h^ih;B, 

( 2 . 10 ) 

= T^(3fc‘ - 18fc^ + 31)AVft«, 

ik:H = ffk(6*' - 55fc‘ + 239A* - 381 )A*mw*, 

Mi0:« = 3F’Tff(3fc'‘ - 52fc* + 410fc' - 1636*:* + 2555 )A'*mj;« • 

From the relation which connects Thiele seminyariants and the moment 
generating function, we get, see [25; p. 57], 

\ n \ {k ^ l)h . __ n 


(2n + 1)^'' . 


22 3! 


( 2 . 10 ) 


\):K “ 0, \y.n 


X2n+J:« — 0, 


(2.11) 


^2n:H — (“ 1) 


- 1 ) 


Ti — • 1 ) 2 ^ 3 ) * • < 


wlu^re \n:H represents the nth Thiele semiuvariant of a rectatigular distribution 
of discrete variates and (n = 1, 2, • • •)» Ihe Bernoulli numbers: • • • . 

In each of the cases considered in this section, corresponding formulas may be 
found for a rectangular distribution of continuous variates by setting h = m/k 
(which makes the range m with k subdivisions) and then letting A: oo . 

2. Adjustments for moments. As our basic distribution we consider the set of 
discrete variates, Jt , (t = 1, 2, • • • , iV), where some of the XiS may not be 
distinct. We assume that the given distribution is grouped in groupings 
of k'\ 



326 


J08SPH A. PIERCE 


ts of the class arc 
fc - (2r - 1 )' 


When Xi is placed in the rth position of a class, the limi 
Xi — {r — l)h and x, + (& — r)h and the class mark is x,- + 

. ~ Tk — (2r — l^n 

Thus, when the class mark is used as the value of x,- , the quantity I ^ h 


)]i 


is added to the true value of x,- . Therefore, when the expected value of a 
particular moment for “k groupings of k" is found, each variate has made a 
definite contribution as it was placed in each of the k positions of a class. 

For convenience, we define 


( 2 . 12 ) 



(2r_- 1) 
2 


h, 


(r = 1, 2, . . . , k). 


As was previously indicated, the expected value of a given moment involves 
the contribution of each variate as it occupies the k class positions. A con- 
venient method of finding these contributions is by means of a universe 
which is composed of the populations ,.A", (r == 1,2, • • , A'). The rth population 
consists of the values of the variates vchm they occupy the rth position of the 
class. Hence rX consists of = Xi + c, , (i = 1, 2, • — , A). 

The notation for moments is the same as that of Part I. Since /,l\\ is of ih(^ 
same form as the universe* studit'd in l^art 1, we use the definitions (1.1) of that, 
part. 

The ('xpected value* of the tih moment is 



Many devices have been used bv previous writers [24; j). 2(59], [25; p. 57], 
[26; p. 157], to evaluate terms of the form - ^ c . However, it should be 

K 


noticed that, the quantities , {i == 1, 2, • • • , A'), are respe*ctively identical 
w ith the deviations (2.b) about t he mean of a rectangular distribution of dise^rete? 
variates. It follows that 


= 



c,-. 


And since = 0, we have 

t\ 

(2.13) 

Formulas for ~ 0, 1 , • • • , 5) are given by (2.10), 

If the class marks are selected as the unit of x, we set h = 1 in (2.10). If the 
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class interval is chosen as the unit of x, we set ft « 1/ft in (2.10). If ft con- 
secutive values of the discrete variable are grouped in a frequency class of width 
m, we put ft » m/ft in (2.10). 

Usually we desire to estimate the value of the moments that would have been 
obtained if we had not grouped the data. Therefore (2.13) is solved for the 
moments of the ungrouped data. We have 

(2.14) ~ ^ ^28/^**'“''"'*' 
wherein 

p _ y ( — l)**(28)i p! fiapjtliiiipftB • » . 

" i( 2 pi)!l'‘l( 2 p 0 !r‘ • • • l( 2 p.)!i'‘>i! t,! • .Tx,!’ 

the summation being taken for every possible product of moments for which 

V 1 » 

S, Z) = p. 

i-l i-1 

Formulas, corresponding to (2.13) and (2.14), for a distribution of continuous 
variates are written by replacing the moment symbols for discrete variates by 
those for continuous variates. 

3. Adjustments for central moments. Consider the universe I > which consists 
of the population ,X,{r = 1,2, • ■ • , ft), where rX is the rth grouped-distribution. 

The* expected value of the <th central moment of the ft grouped-distribution is 
given by (1.3), (1.4) and (1.5) of Part, I, where now is given by (2.13) of 
the preceding section. Thu.s, the development of this section is identical with 
that of section one of Part I with the single exception that pi:„, = pi no longer 
holds but is replaced by pi;„, = pt + o correction. Therefore, the formulas for 
the adjustments for central moments may be obtained immediately from the 
formulas derived in section one, Part I, if the correeftions of the preceding section 
are inserted. We have 

(2.15) Hui, = w + ih:K - . 

(2.16) Hi:h = M3 + 6miPi;„, — 3pn:„,„, -f 2/23:,., . 

(2.17) = M4 + Q}itiir.K + M4:S + 6(i5j — 2p; + /2 s:r)/22:,,, 

+ 12pipi,:,„„, - 12 pi/28:m, - 4/2ii:m,m, 

+ 6/l2i:„,M, - 3/Z4 :m, . 

The moments of the ungrouped data can be obtained readily from formulas 

(2.15) through (2.17). 

Adjustment formulas for central moments of a distribution of continuous 
variates may be obtained from (2.13) by replacing the moment symbols for 
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discrete variates by those for continuous variates and taking the moments about 
the mean. Also, it may be observed that adjustment formulas for central 
moments of a distribution of continuous variates may be obtained from formulas 
(1.3), (1.4) and (1.5) of Part I, provided the moment symbols are exchanged as 
indicated above and terms of the form ^ are set equal to aero. 

4. Usual adjustments for Thiele seminvariants. The usual adjustments for 
Thiele seminvariants, for the univariate discrete population, may be developed 
directly by use of one of the fundamental properties of Thiele seminvariants. 

It is assumed (see [25; p. 55]) that k consecutive values of the discrete variable 
are grouped in a frequency class of width m. The k smaller intervals of width 
mjk — h go to make up the class width m, the actual points representing the k 
values of the variable being plotted at. the centers of the sul>intervals. Now, 
let us suppose that each of the k consecutive boundary points of the subintervals 
is as likely to be chosen as a boundary point of the larger intervals as any other. 
Then, if Xi is the class mark of the ith frequency class, for any true value, x, of 
the discrete variable included in this frequency class, we have 

a;,- = a* + 

in which x and e, are independent variables and Cr takes on the k values (2.12) 
with equal relative frequencies \/k. 

Since we have noted that the equally likely values which Cr may take on are 
deviations about the mean of a rectangular distribution of discrete variates, we 
employ the cumulative property of Thiele seminvariants [9; p. 4] and obtain 
directly 

(2.18) \r., = + (/ = 1,2, ...)• 

where is the /th seminvariant compuU^d from the grouped data, Xr.x is the 
<th seminvariant computed from the ungrouped data and Xr.n is defined by (2.11). 

Formulas corresponding to (2.18), for special values of /, are given by Craig 
[25; p. 57]. However, the present development indicat(*s the dependence of 
adjustment formulas on central moments of a rectangular distribution and pro- 
vides a general formula for these adjustments which is expressed completely in 
terms of Thiele seminvariants. 


6. New adjustments for Thiele seminvariants. If we accept the definition 

1 ^ 

i. rfity {t ~ 2, 3, • • •)> 

K r-1 


then (2.18) is at best only an approximation formula. We now desire exact 
formulas for for the case of a grouped-distribution of discrete variates. 
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First (1.9) is used and terms of the form m.,., • • are evaluated m terms 

of central moments by (1.3). Then terms of the form are evaluated by 
(2.13) and finally the relations between moments and Thiele seminvariants are 
employed. Exact formulas for the expected values of the second, third, and 
fourth Thiele seminvariants for grouiied-distributions of discrete variables are 


given 

below. 




(2.19) 

M1:X2 


X 2 + X2:r 

~ M2:mi • 

(2.20) 

Ml:X, 

= 

Xs + 6Xi/i2:#i, — 3/111:^, ^2 + 2/28;^, . 

(2.21) 

Mi;X4 

= 

X 4 + X4;« 

+ 12[X2 ~ 2 X 1 + MiRlihii 



+ 


— M8:/iilXi ~ 



+ 

1 2/X2i:>i|>i2 

- - 3m2:m, • 


Formulas for Thieh' seminvariants of ungroujied data in terms of expectations 
may be obtained from (2.19) through (2.21). 

Adjustment formulas for Thiele seminvariants of a distribution of continuous 
variates are given by Langdun and Ore (23; p. 231] and Craig (25; p. 57). If we 
denote the tth Thiele seminvariant of a distribution of continuous variates by 


L, , then 



(2.22) 

VllLt — 4 “ y 


where 



(2.23) 

j _o I 

yfy 2^ » 

( = 1, 2, 


Formulas (2.19) through (2.21) may be used -for continuous variates by 
changing the moment symlmls and slotting terms of the form Moi'j - 'r:***, si, •••<■< 
equal to zero. 

6. Adjustment formulas applied to a numerical problem. We consider the 
arbitrary distribution given in Table III. 


T.ABLE HI 

An Arbitrary Distribution of Discrete Variates 


V 

/ 

V 

/ 

V 

/ 

F 

1 

2 

4 

30 

7 

1 

2 + 30+1-33 

2 

8 

5 

4 

8 

1 

8 + 4+1-13 

3 

10 

6 

3 

9 

1 

10 + 3 + 1-14 
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The three grouped distributions, when the variates are grouped in “groupings 
of three,” appear in Table IV. 


TABLE IV 

DitlribuHont Derived from Data of Table III by Making the Three Posrible Orovpinge of Three 



(1) 

(2) 

(3) 

Class 

f 

Class 

/ 

Class 

/ 

1~3 

20 

0-2 

10 

-1 to 1 

2 

4-6 

37 

3-5 

44 

2-4 

48 

7-9 

3 

0-8 1 

1 

5 

5-7 

8 

10-12 

0 

9-11 

1 

8-10 

2 


Using the fixed point 4, moment-functions are computed for the distribution of 
Table III and for each of the distributions of Table IV. These quantities 
along with the average of each moment function appear in Table V. 

TABLE V 


Monwnt-Funclione of the Dixtrihuliom of Tahir III and Table IV. Averagea of Moment- 
F unctions of DisiribulUms of Table IV 


Diet. 

Ml 

M« 

Ma 

M4 

^2 » Xj 

Ms ** ^5 

M4 

X4 

(1) 

9 

• 165 

69 

1125 

9819 

-17442 

238,849,317 

-50,388,966 

60 

60 

60 

60 

(60)’ 

(60)> 

(60)‘ 

(60)= 

(2) 

-9 

171 

81 

2511 

10179 

567162 

557,840,277 

247,004,154 

60 

60 

60 

60 

(60)* 

(60)« 

(60)= 

(60)= 

(3) 

-30 

162 

i:i8 

1938 

8820 

1317600 

528.282,000 

294,904,800 

60 

60 

60 

60 

(60)= 

(60)= 

(^)= 

(60)= 

[ 

A VO. 

-10 

166 

96 

1858 

0606 

622440 

441.657,198 

163,839,996 

60 

60' 

60 

60 

(60)* 

’(66)» 

r r rr I 

' (66)’= 

(60)= 

Grig. 

-10 

m 

m 

1314 

7460 

1 

642400 

.305,034,000 

138,079,200 

Dist. 

60 

60 

60 

60 

('60)= 

(M)* 

"(60)= 

(60)= 


Table VI gives the expected values of the moment-functions as obtained by 
substituting from Table V into the formulas of sections two, three, and five. 
Also the expected values, computed from the usual formulas, are given and the 
errors which would be made, if the usual formulas were used, are indicated. 
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TABLE VI 

Expeeled Valuea of Motneni~Function$ Computed by Formulae 


Kx}>eetation 0 by 

"liM. 

Hiin 

Miiji, 

y*i:M4 

“Uit - 

Ml:;, “ 
Ml:X, 

Mi:;4 


New Formulas 

-10 

166 

96 

1858 

9606 

622440 

441,657,198 

163,839,996 

60 

60 

60 

60 

(fl0)« 

(ao)» 

(60)« 

(60)* 

Usual Formulas 

-10 

60 

m 

60 

96 

60 

1868 

60 

III 

642400 

(60)* 

416, ns, 000 
(60)« 

133,795,200 

(60)* 

lOrrur 

1 

— 

— 

— 

— 

264 

(60)« 

19960 

{00)' 

-24,879,198 

(60)* 

-30,060,796 

(60)* 


7. Evaluation of . It appears at first that it is necessary to form the 
“k groupings of k" in order to evaluate the term which enters the precise 
foi-mula for the expected value of the variance. That was the procedure fol- 
lowed by Carver [26; p. 101]. However, it is passible to evaluate from the 
ungrouped data without forming a single grouped-distribution. 

By definition, 

1 * 

W:<.1 = t £ f***! “ 

k r-l 

where rMi is the mean of the rth grouped-distribution and is the mean of the 
ungrouped distribution. W(! wish to study the terms rMi and • Consider a 
.set of variates o’, , (f = 1, 2, • • • , a), with corresponding frequencies/,- , (t =■ 1,2, 
■ • • , s ). The x’h are subject to the condition, x ,- — Xj_i = 1 , and consequently 

some c)f the/\s may he zero. The mean of this distribution is . 

We define 

Ft == /» + fk^i + f2k+i + • • • , (i = 1, 2, • • • , i) 

Then, if a grouped-distribution is formed vdtli Xi in the fth (f = 1, 2, • • • , fc) 
position of a class, the mean of this grouped-distribution is 

k 

T,xf -f- L FiCi+i-i 
y-1 

Tr~ 

where e<_i = e*. if c,- = 1 and e<+i = ei if e,- = ct . Similarly if a grouped-distribu- 
tion is formed with x, in the (i + l)st pasition of a class, the mean is 

2^/ ,•«,■+/ 

y-1 

— £7 • 
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Thus, it is evident that, given the expression for the mean of any grouped- 
distribution in which x, is in the ith position of a class, we may form the expres- 
sion for the mean of the grouped-distribution in which Xi is in the {i + l)st 
position of a class by a cyclic permutation of the c/s of the given expression. 

Therefore, it follows that if we call rMi the mean of the grouped-distribution 
in which x, is in the rth (r = 1, 2, • • • , A?) position of a class, then 

k 

rMl “ Ml ~ ^ ~ 2, • • • , A*). 

If we define 

k 

N == 2/ and 

then, 

“ km S 

Thus, it is evident that is a function of the frequenci(*s of the variat(‘s and 
of the c,^s. The fact that the values of the variates do not (»ntor m 2 ;»ii F>ermits 
one to quickly calculate its value. 

Consider m 2 :m, for the distribution of Table III. We find 

01 = 33ci -f- 13 c2 4“ 14^3 . 

Then, by successive cyclic permutations of the c/s, 

02 ~ 33^2 + 13c 3 “I” 14ci , 

03 = 33^3 4" I3ci 4“ 14^2 . 

Substituting the values ci = 1, C2 == 0, C3 = -*1 we have 0i = 19, 02 = 1 and 
03 = —20. Therefore, 

254 

which is identical with the value which was found when Table V was used. 

It follows from the preceding development that 

and if Fi = Ft — • ■ • = Fk thon is zero. 

8. Conclusion. The results of this paper include: 

1 . The derivation of general and specific formulas for the expected values of 
population moment-functions. 
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2. The derivation of generalized sampling formulas under the condition that 
samples of n are formed by selecting one variate from each population. 

3. Methods for the transformation of generalized sampling formulas into the 
corresponding infinite and finite sampling formulas. 

4. A method for the transformation of infinite sampling formulas into the 
corresponding finite sampling formulas. 

5. A demonstration of the fact tliat adjustment formulas for moment-function 
of grouped data involve central moments of a rectangular distribution. 

(). A general formula for the expected value of the fth moment of grouped data. 

7. New adjustment formulas for central moments of grouped data. 

8. New adjustment formulas for Thiele seminvariants of grouped data. 

9. A method for the evaluation of the term which appears in the precise 
adjustment formula for the variance. 

Many thanks are due Prof. P. S. Dwyer, to whom the writer is greatly in- 
debted for advice and encouragement. 
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THE ANALYSIS OF VARIANCE WHEN EXPERIMENTAL ERRORS 
FOLLOW THE POISSON OR BINOMIAL LAWS 

By W. G. Cochran 

1. Introduction. The use of transformations has recently been discussed by 
several writers [1], [2], [3], [4], in applying the analysis of variance to experi- 
mental data where there is reason to suspect that the experimental errors are 
not normally distributed. Two types of transformations appear to be coming 
into fairl}" common use: y/x and sin'* yjx^ The former is considered appro- 
priate where the data are small integers whose experimental errors follow the 
Poisson law, while the latter applies to fractions or percentages derived from 
the ratio of two small integers, where the experimental errors follow the binomial 
frequency distribution. In each case the object of the transformation is to put 
the data on a scale in which the experimental variance is approximately the 
sam(^ on all plots, so that all plots may be used in estimating the standard error 
of any treatment comparison. The extent to which these transformations are 
likely to succeed in so doing has been examined by Bartlett [2]. The object of 
the present paper is to discuss the theoretical basis for these transformations in 
mor<‘ detail, and in particular to examine their relation to a more exact anal^'sis. 

2. Experimental variation of the Poisson type. The first step in an exact 
statistical analysis of the results of any field experiment, is to specify in mathe- 
matical terms (1) how the expected values on each plot are obtained in terms of 
unknown parameters representing the treatment and block (or row and column) 
effects (2) how the obsei^^'ed values on the plots vary about the expected values. 
In this section, the variation is assumed to follow the Poiason law\ 

The sjK'cification of the expected values requires some consideration. In the 
standard theory of the analysis of variance, treatment and block (or row and 
column) effects are assumed to be additive. In the case of a Latin square, for 
example, the expected yield m*- of the ith plot, which receives the /th treatment 
and occurs in the rth row and the cth column is written 

(1) mi = G + Tt + /Jr + Cc 

where G is a parameter representing the average level of yield in the experiment, 
and Tt , Rr and Cc represent the respective effects of the treatment, row and 
column to which the plot corresponds. Since the T, R and C constants are 
required only to measure differences between different treatments, rows and 
columns, we may put 

(2) Z = Z Rr * E c. = 0. 

Ire 
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If the experimental errors are normally and independently distributed with 
equal variance, this specification leads to very simple equations of estimation 
for the unknown parameters, the maximum likelihood estimate of Tt , for 
example, being the difference between the mean yield of all plots receiving that 
treatment and the general mean. In addition to it^s simplicity, this type of 
prediction formula is fairly suitable for general use, because it gives a good 
approximation to most types of law which might be envisaged, provided that 
row and column differenc^es are small in relation to the mean yield. Howt^ver, 
in considering an exact analysis with Poisson variation, the prediction formula 
is assumed chosen, without reference to computational simplicity, as being the 
most suitable to describe the combined actions of treatment and soil effects. 

The probability of obtaining a given set of plot yields Xi with expectations w, 
may be written 

c“”*‘ mV 

Xil 

Thus L, the logarithm of the likelihood, is given by 

(3) ^ 2 

i i 

Hence the maximum likelihood equation of estimation for any parametcM* d 
assumes the form 

(4) V = 0 

wii dS 


n 

i 


where the summation extends over all plots whose expectations involve 6. Th(‘ 
dfH 

function — will usually involve a numlx*!- of parameters. Sinct' the specifica- 
d$ 

tion of row^, column and treatment effects in a (> x 6 Latin square requires Ki 
independent paramet/crs, the solution of these equations may be expected to 
laborious, though it may be shortened by the intelligent use of it(*rative methods. 
The problem of obtaining exact tests of significance is also difficult. The 
method of maximum likelihood provides estimates of the variances and co- 
variances of the treatment constants, which under certain conditions can be 
assumed to be normally distributed if there is sufficient replication, but this can 
hardly be considered an exact “small sample’’ solution. 

These remarks show that the exact solution is somewhat too complicated for 
frequent use. The difficulty arises principally because the typical equation of 
estimation consists of a weighted sum of the deviations of the observed from the 


expected values, the weights being 


1 dmi 


1 


. The factor — was introduced into 
m.i dB nii 

the weight by the Poisson variation of the experimental errors, and must l)e 
retained in any theory which claims to apply to Poisson variation. It Is, how- 
ever, worth considering whether some simplification cannot be introduced into 
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the equations by assuming some particular form for the prediction formula. 
This line of approach seems promising when one considers the simplification 
introduced into the “normal theory” case by assuming, the prediction formula 
to be Imear. 

For Poisson variation, the linear law does not appear to be particularly suit- 
able, since it may give negative expectations on some plots (as happens in the 

' 

numerical example considered in the next section). Further, while — * becomes 

oB 

a constant, the factor — remains in the weight. 

The entire weight can be made constant by assuming a linear prediction 
formula in the square roots and transforming the data to square roots. For a 
Latin square, this prediction formula is written 

(5) \/i^ = = (7 + + flr + Cc, 

where 

(6) L = r «r = Z Co = 0. 

t r e 

To find the maximum value of (3) subject to the restrictions (6), we may use the 
m('thod of undet(‘rmined multipliers, maximizing 

(7) L + + m( 5Z ^r) + Cr). 

t T t 


The ecjuation of (estimation for a typical treatment constant 7\ becomes 

(8) + i.c., 2 + X - 0, 

V m, / dai dl t y/nii ^ 


the summation being extended over all plots i*ecei\ing the treatment, 
a, = \/.r, , then by Taylor^s theorem 


(9) 


y \ dflli , 1 y \2 d ftli 

Ti — nti = (a,- — a,) ^ ^ (o, — «,) 


+ 


If 


If «i, is reasonably largo, only the first term on the right-hand side need be 
n'tained. When Is .small, we may use. instead of the exact square root, a 
quantity n'i defined so that 

(10) Xi - Wi = {a'i - a.) = 2 y/rniia'i — «<)• 

cloi% 

Thus if the analysis is performed on the quantities a[ instead of on the original 
data, equation (8) becomes 

2 4(ai — a*) + X == 0. 

Tt 


( 11 ) 
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On substituting the expectations for a,- from (5), and using (6), we obtain 
(12) Z 4(0.' - G - T.) + X = 0. 

Ti 

The corresponding equation for G is 


(13) 


£ 4(o: - G) - 0, 


so that G is the general mean of the quantities o'. By adding equations (12) 
over all treatments, and comparing the total with (13), we find X = 0 Hence 
Tt is the difference between the mean yield of o' over all plots receiving and 
the general mean of o'. In this scale the simplicity of the “normal theory” 
equations has apparently been recovered. Actually, the quantities o' are not 
known exactlj', since 


(14) 


o' = a + 


{x — m) 
2 



where a is the expected value of x- However, this pro(!ess provides a means 
of succcssivelj" approximating the maximum likelihood solution, by choosing 
first approximations to the quantities a, constructing the a'’s. .solving for the 
unknown constants and hence obtaining second approximations to the expected 
values. The close relation of o' to y/x is seen by remembering one of the 
common rules for finding square roots. This consists in guessing an approxi- 
mate root (a), dividing x by the approximate root, and taking the mean of the 
approximate root (a) and the resulting quotient (x/a). 

The suitability of the linear prediction formula in square roots must be con- 
sidered in any example in which the above analysis is being employed. The 
law is intermediate in its effects l)ctween the linear law and the product law in 
the original data. My experience is that it is fairly satisfactory for gcmeral use. 
(cf. [2], p. 72) An exception may occur when it is desired to test the inter- 
action between two treatments, both of which produce large effects. In this 
case the definition chosen for absence of interaction may not coincide at all 
closely with the definition implied in using the linear law in square roots. An 
example of this case was given in a previous paper fl]. 

In this connection it should be noted that an approximate “goodness of fit” 
test may be obtained of the validity of the assumptions made. Since the quan- 
tities a'i enter into the equations of estimation with weight 4, the quantity 
4 £ (®< ~ «<)* >8 distributed approximately as x* with the number of degrees 

of freedom in the error term of the analysis of variance. Some idea of the 
closeness of the approximation may be gathered by considering the simplest 
case in which only the mean yield is being estimated. In this case the observed 
values X are assumed to be drawn from the same Poisson distribution, and the 
sufficient statistic for the mean G is known to be S(a;,)/n. Since, however, the 
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prediction formula is here the same in square roots as in the original scale^ and 
since the maximum likelihood sol ution is invariant to change of scale, the mean 
value a of a' must be exactly \/2(x)/n, as the reader naay verify by working 
any particular example. Thus S4(a' — af is found to be ^{x — the 

usual tent for examining whether a set of values x may reasonably be assumed 
to come from the same Poisson distribution. By working out the exact distri- 
bution of l{xi xf/x in a number of case^^ [6], I previously expressed the 
opinion that this quantity followed the x' distribution sufficiently closely for 
most practical uses, even for values of the mean as low as 2. This opinion has 
since been substantiated by Sukhatmc, [6] who sampled this distribution for 
m = 1, 2, 3, 4, and 5. 

A high valuta of x“ means either that the prediction foimula is not satisfactory 
or that the expt^rimental errors an? higher than the Poisson distribution indi- 
cates, or that lx)th causes are operating. These effects can sometimes be sepa- 
rated by examining whether the observed yields deviate from the expected 
yields in a systematic or a random manner. If th(? deviation is systematic, the 
prediction fonnula is probably unsatisfactory. 

The type of approach used above resembles in many features the ^^exact*’ 
analysis for the probit transformation [7]. The principal difference is that in 
th(? case of probits the transformation is made to suit the a priori prediction 
formula, which postulates that the probit^s are a linear function of the dosage, 
or of the log (dosage). Thus with probits the equations of estimation still 
involve? weights in the transformed .scale. These do not seriously complicate 
the' analysis, since only two parametcu's require to be estimated for a given 
poison. With, however, the much greater numbt'r of parameters usually in- 
volved in specifying the results of a field experiment, the attractiveness of a 
solution which does not involve' weighting is greatly increased. 

3. Numerical example of the square root transformation. A 5 X 5 Latin 
square experiment, on the effects of different soil fumigants in controlling wire- 
worms was selected as an example. The average number of wireworms per 
plot (total of four soil samples) was just under five. Previous studies [8], [9] 
have indicated that with small numboi*s per sample, the distribution of numbers 
of wireworms tends to follow the Poisson law. 

The plan and yields are shown in Table I. The first two figures under the 
tTeatment symbols are the numbers of wireworms and their square' roots respec- 
tively, the latter being regarded as first approximations to the values a'. Two 
of the plots receiving treatment K gave no wireworms. Since these? plots are 
likely to be changed most in the transition from square roots to a', l^etter 
approximations were e^stimated for them before proceeding with the calculations. 
The best simple approximations appeared to be obtained from the square root.s 
of the means in the original units. For the plot in the? second row and second 
column, the square roots of the row, column and treatment means in the original 



340 


W. O. COCHRAN 


TABLE I 


Plan and number of wireworms per plot 


p 

0 

AT 

K 

M 

Mean 

3' 

2 

5 

1 

4 


1.73* 

1.41 

2.24 

1.00 

2.00 

1.676* 

1.76* 

1.45 

2.25 

1.11 

2.00 

1.714* 

1.77^ 

1.46 

2.25 

1.10 

2.00 

1.716* 

M 

K 

0 

N 

P 


6 

0 

6 

4 

4 


2.45 

(0.39) 

2.45 

2.00 

2.00 

1.858 

2.45 

0.32 

2.50 

2.02 

2.02 

1.862 

2.46 

0.32 

2.49 

2.02 

2.02 

1.862 

0 

M 

K 

P 

N 


4 

9 

1 

6 

5 


2.00 

3.00 

1.00 

2.45 

2.24 

2.138 

2.10 

3.09 

1.00 

2.47 

2.25 

2.182 

2.13 

3.08 

1.00 

2.46 

2.25 

2.184 

AT 

P 

M 

0 

K 


17 

8 

8 

9 

0 


4.12 

2.83 

2.83 

3.00 

(0.79) 

2.714 

4.18 

2.84 

2.8:3 

3.00 

0.77 

2.724 

4.17 

2.84 

2.83 

3.(K) 

0.77 

2.722 

K 

AT 

P 

M 

0 


4 

4 

2 

4 

8 


2.00 

2.00 

1.41 

2.00 

2.83 

2.048 

2.14 

2.02 

1.49 

2.04 

2.92 

2.122 

2.10 

2.03 

1.50 

2.05 

2.90 

2.116 

Mean 2.460* 

1.926 

1.986 

2.090 

1.972 

2.087* 

2.526* 

1.944 

2.014 

2.128 

1.992 

2.121* 

2.526* 

1.946 

2.014 

2.126 

1.988 




Treatment Means 



K 

P 

0 

M 

N 


1.036* 

2.084 

2.338 

2.456 

2.520 


1.068* 

2.116 

2.394 

2.482 

2.544 


1.058* 

2.118 

2.396 

2.484 

2.544 


' Original numbers. 

* Square roots. * Second approximations. * 

Third approxima 


tions. 
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uoits are respectively 2.000, 2.145 and 1.005, and the square root of the general 
mean is 2.2!^. Hence 

o' = i[2.000 + 2.145 + 1.095 - 2(2.227)] = 0.39. 

The other zero value was similarly found to give o' = 0.79. The corresponding 
estimates from the means of the square roots were considerably too low, since 
the o' values tend to be higher than the square roots. The use of “missing plot” 
technique gave very poor approximations, because it ignores the fact that the 
plots in question had zero yields. 

With the estimated values inserted, the row, column, and treatment means 
of the square roots are as shown in Table I. A second approximation to a' 
was calculated for each plot. For the plot in the first row and the first column, 
the expected yield is 

« = 1.676 + 2.460 + 2.084 - 2(2.087) = 2.046. 

Hence o' = ^ (2.046 + 3/2.046) = 1.76. These values constitute the third set 
of figures in Table I. Theoretically, it is advisable to readjust the row, column, 
and treatment means after each new value of o' has been obtained, in order to 
secure rapid convergence. This is rather laborious in practice, and a complete 
set of new plot values was obtained before readjusting the means. The third 
approximations obtained by this method are shown in the fourth lines in Table I 
and are correct to two decimal places. 

It is noteworthy how closely the square roots agree with the third approxi- 
mations on all plots except those which originally gave zero yields. The differ- 
ences between the second and third approximations are trivial. 

The next step is to make a x* test by means of the quantity 4S(o' — af. 
From the manner in which the values a are constructed from the o'’s, it follows 
that S(a' — a)* is simply the error sum of squares in the conventional analysis 
of variance of the values a'. The analysis of variance of the third approxi- 
mations is shown in Table II. 


TABLE II 

Analysis of variance of adjusted square roots 



Degrees of freedom 

Sum of squares 

Mean square 

Rows 

4 

2.9815 


Columns 

4 

1.1190 


Treatments 

4 

7.5815 

1.8954 

Error 

12 

4.5970 

0.3831 


The value of x* is 4 X 4.597 = 18.39, with 12 degrees of freedom, which is 
just about the 10 percent level. If the hypothesis is regarded as disproved 
only when x* exceeds the 5 percent level, the treatment means may be tested 
by regarding them as approximately normally distributed with variance 
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1/5 X 0.25 = 0.05. It is, however, more prudent to use the actual error mean 
square as an estimate of the experimental error variance, performing the usud 
tests associated with the analysis of variance. This may be justified on the 
grounds that the calculations have produced a set of plot values a! of eq ual 
weight. On this basis the standard error of a treatment mean is VOiSSSl/fi ^ 
0.2768. ' Treatment K reduced the number of wireworms significantly below 
all other treatments, but there is no indication of any difference between the 
other treatments. The treatment means may be reconverted to the original 
units by squaring. 


4. Ei^riinental variatkm of the binomial type. In this case the yields are 
obtained by examining a constant number n units per plot and noting those 
which possess a certain attribute (e.g., plants which are diseased). Experi- 
mental variation is presumed to arise solely from the binomial variation of the 
observed fraction p possessing the attribute about the expected fraction P, wliich 
is specified in terms of unknown parameters representing the treatment and 
soil effects. 

If r,- is the number possessing the attribute on a typical plot, so that p, = r</n 
the likelihood function takes the form 


II 


n! 


r<!(n — r<)! 


PJ‘0”' 


Hence the terms in the logarithm which mvolve the unknown imrameters are 
given by 


(15) L = ^ {r, log Pi + (n - r.) log Q.). 

% 


The equation of estimation for a typical constant 0 is 

( 1 .) -0 

where the summation is over all plots whose expectations involve 0. 

As in the Poisson case, an exact solution is laborious because of the weights 
fl dJP* 

*. The unequal weighting may be removed by transforming to the 

PiVi 00 

variate a< = sin ‘ \/p ^ , and assuming that the prediction formula is linear 
in the transformed scale. For a liUtin square the prediction formula is assumed 
to be 


(17) ai^G+r, + Jtr + Cc 

where the tth plot receives treatment t and lies in the rth row and cth column. 
Further 

(18) 
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S.P — 

Since P< «= sin* a< , » ^y/PiQt • A set of variates aj is defined so that 

on each plot 

(19) (oI “ ^ “ 2\/P<0< (o< - <*<)• 

With these substitutions, the equation of estimation for Tt , for instance, 
becomes 

(20) 23 4n(aJ - a.) + X = 0 

Tt 

where, as before, X is an undetermined multiplier. The remainder of the solu- 
tion proceeds exactly as in the Poisson case, Tt being found to be the difference 
between the mean value of oj over all plots receiviM this treatment and the 
general mean of oj . A x* teat may be made with 2 4n(oj — «<)*• 

i 

From (19) 

(21) a< = a. + 2y/p^t ~ + 2y/P^i ”” 

(22) = o« + i cot oi — gi cosec (2ci<) 

where g< is the observed fraction which does not possess the attribute. The 
calculation of appronmations to a'i thus involves finding a predicted value ui 
from the treatment and block (or row and column) means, and using equation 

(22) . Tables [10] of the values of sin”' •%/?< » J cot a,- , and cosec (2o<) 
have been prepared to facilitate the computations. It should be noted that 
these tables are in degrees, whereas the above equations assume that a< is 
measured in radians. In degrees, equation (20) above becomes 

(23) 

while 

(24) oj = «< -t- J cot o< — g,- cosec (2a<)). 

As in the Poisson case, the appropriateness of the linearly additive law in 
equivalent angles depends on the way in which treatment and soil effects operate. 
As Bliss has shown [11], the effect of the transformation is to flatten out the 
cumulative normal frequency distribution, extending the range over which it 
can be approximated by a straight line. 

6. Nuineiicdezain^eof theangiilartiansfonnati^ The data were selected 
from a randomised blocks e]q>eriment by Carruth [12] on the control by me- 
chanical and insecticidal methods of damage due to com ear worm larvae. 
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The control and the six types of mechanical protection were chosen for analy^, 
the “yields” being the percentages of ears unfit for sale. The numbers of ears 
varied somewhat from plot to plot, the average being 36.5, but the variations 
were fairly small and appeared to be random. It was considered that varia> 
tions in the weight (4n) could be ignored in solving the equations of estimation, 

TABLE III 

Percentages of unfit ears of corn 

Treaimenls Blockn Meana 



I 

II 

III 

IV 

V 

VI 



42.4* 

34.3 

24.1 

39.5 

55.5 

49.1 


1 

40.6* 

35.8 

29.4 

38.9 

48.2 

44.5 

39.57* 


40.7* 

36.0 

29.4 

38.9 

48.6 

44.6 

39.70* 


23.5 

15.1 

11.8 

9.4 

31.7 

15.9 


2 

29.0 

22.9 

20.1 

17.9 

34.3 

23.5 

24.62 


29.1 

23.1 

20.3 

18.2 

34.3 

23.5 

24.75 


33.3 

33.3 

5.0 

26.3 

30.2 

28.6 


3 

35.2 

35.2 

12.9 

30.9 

33.3 

32.3 

29.97 


35.5 

35.3 

14.5 

31.0 

33.4 

32.4 

30.35 


11.4 

13.5 

2.5 

16.6 

39.4 

11.1 


4 

19.7 

21.6 

9.1 

24,0 

38.9 

19.5 

22.13 


19.8 

21.7 

10.0 

24.4 

39.9 

19.6 

22.57 


14.3 

29.0 

10.8 

21.9 

30.8 

15.0 


5 

22.2 

32.6 

19.2 

27.9 

33.7 

22.8 

26.40 


22.6 

32.7 

19.2 

28.0 

.33.7 

22.9 

26.52 


8.5 

21.9 

6.2 

16.0 

13.5 

15.4 


6 

17.0 

27.9 

14.4 

23.6 

21.6 

23.1 

21.27 


17.4 

28.2 

14.5 

24.0 

22.1 

23.2 

21.57 


16.6 

19.3 

16.6 

2.1 

11.1 

11.1 


7 

24.0 

26.1 

24.0 

8.3 

19.5 

19.5 

20.23 


24.3 

26.2 

28.8 

10.9 

20.1 

19.5 

21.63 

Means 

26.81* 

28.87 

18.44 

24.50 

32.79 

26.46 

26.31 


‘ Percentage. * Equivalent angle. * Second approximation. 

The percentages of unfit ears, the equivalent angles and the second approxi- 
mations to a' are shown in descending order in Table III. The percentages on 
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individual plots vary from 2.1 to 55.6. The second approximations were cidflu- 
lated from the block and treatment means of the an^es. For the control plot 
(treatment 1) in block I, for example, the expected value is 

39.67 + 26.81 - 26.31 » 40.07. 

Since Fisher and Yates’s tables of a + i cot a and cosec (2a) are given for 
values of a from 45° to 90°, we take the complement of the expected value, 
which is 49.93. Interpolating mentally from the table, we find 

o + J cot a = 74.0, cosec (2o) = 58.3. 

Thus the second approximation to the complement of the angle is 

74.0 - 0.424 X 58.3 = 49.3. 

Hence the second approximation to a' is 40.7, which agrees very closely with 
the equivalent angle. 

On the majority of the plots, the second approximation differs by only a 
trivial amount from the equivalent angle. The plots with the three lowest 
percentages (2.1, 2.5, and 5.0) have increased somewhat more, and also one or 
two other plots where the angles deviated considerably from the expected values. 
A third set of approximations was not considered necessary. 

The analysis of variance of the second approximations is given in Table IV. 


TABLE IV 



Degrees of freedom 

Sum of squares 

Mean squares 

Blocks 

5 

709.79 


Treatments 

6 

1,531.56 

255.26 

Error 

30 

982*. 67 

32.76 


Taking n as 36.5, the expected value of the error mean square is 820.7/36.5 = 
22.48. Thus x* = 982.67/22.48 43.71, with 30 degrees of freedom, which is 

almost exactly at the 5 percent level. This, together with the appreciable 
amount of the variance removed by blocks, indicates that the experimental 
error probably contains some element other than binomial variation. As in the 
preceding case, it would be wise to make the usual analysis of variance tests 
with the actual error mean square. 

6. Disauaion. It must be emphasised that the solutions given above apply 
to the case where the whole of the experimental error variation is of the Poisson 
or binomial type. The methods are therefore likely to be useful in practice only 
where the experimental conditions have been carefully controlled, or where the 
data are derived from such snuill numbers that the Poisson or binomial variation 
is much larger than any extraneous variation. The x* test is helpful in deciding 
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whether this assumption is justified. Further, the examples worked above 
indicate that the transformed values form very good approximaticHis on most 
plots. It will often be sufficient to adjust only those plots which give sero or 
very small values in the Poisson case, or zero or 100 percent values in the 
binomial case. In this connection the method of adjustment given above may 
perhaps be considered as an improvement on the empirical rule given by Bartlett 
[13] of counting n out of n as (n — 1/4) out of n. 

Where extraneous variation becomes important, as is probably the normal 
case with data derived from field experiments, there seem to be no theoretical 
grounds for using the adjusted values. If we were prepared to describe accu- 
rately the nature of the variation other than that of the Poisson or binomial 
tsrpe, a new set of maximum likelihood equations could be developed. These 
would, however, lead to a different type of adjustment. 

The justification for the use of transformations has no direct relation to the 
Poisson or bino m ial laws in this case, or in cases where percentages are derived 
from the ratios of two weights or volumes, as in chemical analyses, or from an 
arbitrary observational scoring With percentages, for example, it may be 
said, without describing the experimental variation in detail, that the variance 
must vanish at zero and 100 percent and is likely to be greatest in the middle. 
The formula F = XPQ is at least a first approximation to this situation. The 
angular transformation will approximately equalize a distribution of variances 
of this type, provided that X is sufficiently small. We have, of course, returned 
to an “approximate” type of argument. It follows that the original data should 
be scrutinized carefully before deciding that a transformation is necessary and 
that any presumed opinions about the nature of the experimental variation 
should be verified as far as possible. 

7. Summaxy. This paper discusses the theoretical basis for the use of the 
square root and inverse sine transformations in analyzing data whose experi- 
mental errors follow the Poisson and binomial frequency laws respectively. 

The maximum likelihood equations of estimation are developed for each case, 
but are in general too complicated for frequent use. If, however, the expected 
3 aeld of any plot is assumed to be an additive function of the treatment and 
soil effects in the transformed scale, a transformation can be found so that the 
equations of estimation assume the simple “normal theory” form. The trans- 
forms are closely related to the square roots and inverse sines respectively. 

The nature of the assumed formula for the expected values is briefly discussed, 
and a x test is developed for the combined hypotheses that the prediction 
formula is satisfactory and that the experimental errors follow the assumed law. 

Numerical examples are worked for both types of transformation. These 
indicate that even for data derived from small numbers, the square roots or 
inverse sines are good estimates of the correct transforms on almost all plots, 
except those which give zero jdelds in the Poisson case, or percentages near 
zero or 100 in the binomial case. 
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In practice, these new methods are not recommended to supplant the simple 
transformations for general use, because it can seldom be assumed that the 
whole of the experimental error variation follows the Poisson or bincmual laws. 
The more exact analysis may, however, be useful (i) for cases in which the plot 
yields are very small integers or the ratios of very small integers (it) in showing 
how to give proper weight to an occasional lero plot yield. 
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NOTES 

This section is devoted to brief research and expository articles^ notes on methodology 
and other short items. 


ORTHOGONAL POLYNOMIALS APPLIED TO LEAST SQUARE FITTING 
OF WEIGHTED OBSERVATIONS 


By Bradford F. Kimball 


!• Introduction. Let the independent variable be denoted by x, and let it 
range over n consecutive integral values Xi to Xn. Thus x represents the 
index-number of the ordered intervals at which observations are taken, where 
the intervals are all of equal length, and an index-number is assigned in con- 
secutive order to every interval within the range of investigation, whether ob- 
servations occur in that interval or not. Let denote the observation measure 
(usually referred to as observed value), if such observation exists. Let Ws denote 
the weight of that observation, with weight zero assigned where observations 
are lacking. 

To shorten the notation, summation over all values of x from xi to Xn will be 
denoted by the sign X. If a subscript and superscript is used, the context will 
indicate the variable to which the summation refers. The rth binomial coeffi- 
cient will be denoted by 

A system of pol 3 momial 8 <hr{x)f r = 0, 1, 2, 3, • • of degree r in x is said to be 
an orthogonal system, for the purposes of this paper, if they satisfy the relations 

(1) E W.<l>r{x)il>.(x) 

To construct the polynomials, one may write them in the form 

— /o(*) = constant 

( 2 ) ^ 

^r(®) ~ frix) hi<f>i(,x) f = 1, 2, 8, • • 


= 0, r 8 

^ 0, r = 8. 



where the hi are constants and the fr{x) are arbitrary polynomials of degree r. 
It then follows from the conditions of orthogonality that 


( 3 ) 
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Thus when the polynomials f,{x) have been chosen for all r, the system of 
orthogonal polynomials for a given set of weights can be constructed and is 
uniquely determined except for a constant factor [1]. 

By virtue of the relation (2) and the conditions of orthogonality (1), it follows 
that 

(4) Sw,[4r(a;)]* = 2w4rix)tl>,(x). 

Define the function 4>(r, k) by 

(5) 4>(r, k) = r = 0, 1, 2, 3, • • • . 

It follows from the relations (2) and (3) that 

( 6 ) 

•~0 t ) 

where it is to be noted that this summation is independent of x. 

Define qr and Yr by 

(7) Qr = S«)x[4r(x)]‘ = Zwjr{x)^r{x) = <>(r, t), 

(8) Yr = 

Then if Ur{x) reprtiscnts the jwlynomial solution of degree r of the normal equa- 
tions set up for observed values y* and* weights id* , 

(9) Wr(x) = — H — ^ ^i(x) -f- ^ 4»(x) + )'••, + — ^(x). 

go qi g* Qr 

If denofes the weighted sum of the squares of the discrepancies between 
the ordinates Mr(x) of the fitted curve and the observed values y, , then [2], 

(10) = SiP.Mx) - yj* = ]C w.yJ— £ “ • 

<-o g< 

The practicability of the use of orthogonal polynomials is thus seen to depend 
upon whether the quantities 4>(r, k) and Yr can be evaluated in a reasonably 
simple manner. 

The thesis of this paper is that if /r(x) is taken as the binomial coefficient 

one can effectively apply the method of orthogonal polynomials. This is made 
possible by the use of factorial moments in conjunction with an adding machine 
that prints cumulative totals. 

In treating the same problem Aitken sets up the normal equations in terms 
of factorials, but considers the explicit use of orthogonal polynomials imprac- 
tical. He writes: “the arbitrary nature of the weights stands in the way of 
any analytical sophistication; orthogonal polynomials emerge, but arc not of 
great use; and the necessity of solving the moment equations cannot be circum- 
vented” [3]. He prefers a determinantal method of solution of the normal 
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equations which the writer has found to be more involved from a practical point 
of view, than the present method, although it is elegant from a theoretical 
standpoint. 

Thus although the present method is not new from the point of view of 
theory, the writer has found that forms made up by the use of the technique 
suggested below, ofifer an effective method for fitting polynomial curves to 
weighted observations. 


2. Simplification of the problem when /r(x) 
and Mr are defined by 



Factorial momenls Sr 


( 11 ) 



r = 0, 1, 2, 


These moments are not difficult to compute and are readily checked as com- 
puted. Formula for 4>(r, k) then becomes 


( 12 ) 


Tims since ^(x) 


Again 


4>(r, *) = 2 
I, 4>(r, 0) = w. 


M>*^*(X). 


= Sr and hence 



= x 

mo) 


Sx 

So' 








= (r + l)iSrfl +• rSr — 


Hence 


gt = 4>(1, 1) = 25,-1- 



S,. 


A recursion formula for 4>(r, k) may be obtained by expanding in formula 
(12) by means of (6). Thus 


(13) 


4>(r, *) 



w. - Zrf 


t-i 

tPx- 22 

i-O 


#(r, i)»(fe, i) 
9i 


The first term can be easily expressed as a linear combination of binomial coeffi- 
cients, and thus as a linear combination of moments Si . 
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The fonniils for F, can be broken down aa follows: 



Yq 23 Wmlfm Afo> 


(14) 

Fr =» £ W.y.^r(®) “ S 

) - 2 V ^ [D 

' <-• Qi 


^ _g*(r.t)F^ 

<-« 9< 


Thus 

Fi = Jlfi-|F„ 



F, = fift - Fi - 

9i 

_*(2,0)y 

9» 


3. General technique of conqiutation. In determining the best fitting poly- 
nomial of degree r, the ratios $(r, t)/9< to play an important part. 

In a form for calculation, these quantities should receive simple designations 
such as bi for a second degree curve, c< for a third degree curve, etc. Suppose 
they are designated by Ri for a curve of degree r; then 

(15) ^,(x) = (*) - 2 RiM 

\r/ 

(16) Yr^ Mr- ilRiYi 

«-0 

(17) 9r = £ ^ 

and in determining 4>(r, A:) for A = 0, 1, 2, • • • r — 1, formula (13) may be 
written : 

(18) *(r, k) =» 

The fact that these quantities Ri appear as multipliers in so many of the 
fundamental formulas greatly simplifies the mechanics of the calculation, espe- 
cially when a calculating machine is used. 

In final determination of polynomial curve the differences of the polynomial 
at X “ 0 are readily determined since the leading term of each orthogonal 
polsmomial is a binomial coefficient and thus 

5*^r(0) » - 2 fl.A*^(0), 

i-0 

A'^(0) - 1. 


(19) 


A; =* 1, 2, 3, . • . , r - 1 
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Since the effectiveness of the method depends upon the availability of ah 
adding machine which records a cumulative subtotal, the determination of the 
curve from the differences at the point a; = 0 is not a hardship and indeed 
affords a quick and accurate means of setting up the curve for purposes of 
plotting and checking. 

Ur(0) = ^ + 2i«i(0) + -*^(o) + ,•••, + -0,(0), 
go gi 92 9r 

(20) A*«,(0) = -* + A* 0,(0), 

9t 9*+i 9, 

a'm,( 0) = Ir. 

9, 

The advantage of the use of orthogonal polynomials becomes particularly 
apparent when error formulae are to be used. The formula for the sum of the 
squares of the discrepancies, denoted by is given above (formula (10)). 
The estimated variance V of the weighted observations about the fitted curve 
is thus E^/(n — r — 1) where n is the number of values of x used in fitting 
and r is the degree of the curve fit ted. Recalling that the matrix of the normal 
equation.s is of the diagonal form with diagonal elements 9 u , 9 i , • • • , 9 , it 
follows that the coefficient Yk/qk of 0/t(x) in the expansion of ti ,{ x ) has the 
variance F/g* . 

Furthermore the variance of the ordinate of the fitted curve Ur ( x ) at a point x 
due to sampling variations in the determination of the coefficients of the curve, 
under the assumption that the weights and values of the independent variable x 
do not involve errors, has the simple form 


( 21 ) 


Variance of Ur(x) 




at points - V\^ + 




f>l(x) 


L go gi ' Qr 

since the covariances of the orthogonal polynomials are zero [4]. 
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COMBINATORIAL FORMULAS FOR THE rth STANDARD MOMENT 
OF THE SAMPLE SUM, OF THE SAMPLE MEAN, 

AND OF THE NORMAL CURVE 

By P. S. Dwtbr 

The standard moments of the normal curve are usually expressed by the two 
statements [1, p. 97] 


«*• =■ 


[a»«+i =0 j 

It is of some interest to note that these two statements may be generalized into 

a single statement by observing that is the number of ways in which 2a 

things can be grouped in pairs and that 0 is the number of ways in which 28 + 1 
things can be grouped in pairs. It is obvious that an odd number of things 
can not be grouped in pairs since there must be at least one unpaired unit. It 
is clear, too, that the number of orders in which 28 things can be grouped in 


pairs is ( 2 2 ’ ©(D 

resulting paired groups (rather than the orders of grouping) are counted it is 


and this is 


However if the 


seen that each paired grouping is repeated «! times so that represents the 

number of ways 2s things can be grouped in pairs. If we arbitrarily define the 
number of ways 0 things can be grouped in pairs to be 1 (or if we limit our 
theorem to values of r > 0) we may say ‘The rth standard moment of the 
normal curve is equal to the number of ways in which r things can be grouped 
in pairs.” 

As presented above the combination representation is used primarily as a 
means of unification of results. However, it is possible to derive the standard 

moments of the normal curve in such a way as to indicate the term early 

in the proof and to trace it throughout the proof. I follow the method outlined 
by H. C. Carver [2] in obtaining the normal distribution as the limit of the 
distribution of sample sums (or of sample means) though I use a somewhat 

different notation [3, p. 6]. If we let (pr* - ■ p:*) represent the number of 

ways in which r units can be collected with n groups containing pi units, xj 
groups containing pt units, etc., then the multinomial theorem can be expressed 
as [3, p. 17] 
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where the summation is taken over all possible partitions pp • • • pp of r and 
the expression (p** • • • pp) represents the power product form [3, p. 14] which 
is iTilr*! •••«•,! times the monomial symmetric function. If p represents the 
number of parts of the partition then 


while 


p * *■! + T| + • • • + *■. 


r = Pin + pfKi + • • -f p.n . 


Now it can be shown from (2) in the case of infinite sampling that 



and since fii = 0, it is only necessary to sum over all partitions which have no 
unit part. We have then, dividing by [wui)]*' = 

(4) »-»- -(o..,)-. 


We have now a formula for the rth standard moment of the sample sum which 

is expressed essentially in combination notation since the quantity 

represents the number of ways in which r units can be grouped t-o form irj 
groups containing pi imits, ir 2 groups containing units, etc. All non-unitary 
groupings of r are formed, each combinatorial coefficient is computed and multi- 
plied by times the product of the corresponding and the sums are 

formed. It might be noted that the formula for the rth standard moment of 
the sample mean is identical with (4) while the corresponding finite sampling 
(without replacements) formula is 



(5) 


ar:(i) 


" ^ Vp:* • • • vp) 


The P’s are defined in previous papers [2, p. 105-6][3, p. 113]. 

Wc obtain the formula for the rth standard moment of the normal curve by 
taking the limit of (4) as n — > «. (H. C. Carver has pointed out [2, p. 121] 

that this method of derivation imposes fewer restrictions than does the deriva- 
tion from Hagen’s hypothesis.) Each partition term will approach aero as n 
approaches infinity if p < Jr. Now the only non-unitary partition in which 
p is not less than Jr is the partition 2**^ and we can have this partition only when 
r is even. Now the limit as n approaches infinity of /n^ is unity and we 
have, in the limiting case 

if r is even. 

0 if r is odd. 
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Since 



is the number of ways r units can be grouped in pairs when r is 


even and since 0 is the number of ways r units can be grouped in pairs where 
r is odd, it follows that the rth standard moment of the normal curve is the 
number of ways in which r units can be grouped in pairs. 

This development is of interest in that it makes possible the tracing of the 

value back through the various stages of the development to the coefficient 

of (2^') in the power product expansion of the multinomial theorem. 
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ON A METHOD OF SAMPLING’ 


By E. G. Olds 


It is recorded that Diogenes fared forth with a lantern in his search for an 
honest man. History does not tell us how many dishonest men he encountered 
before he found the first honest one but, judging from the fact that he took his 
lantern, apparently he expected to have a long search. The general problem of 
sampling inspection, of which the above is a special case, can be stated as follows : 

Given a lot, of size m, containing s items of a specified kind. If items are 
to bo drawn without replacement until i of the s items have been drawn, how 
many drawings, on the average, will be necessary? 

Uspensky’ lias solved a problem concerning balls in an um, from which the 
answer to the above question can be obtained for the special case f = 1. For 
the general case, the distribution for the number n of the drawing in which the 
ith specified item appears, is pven by terms of the series: 


( 1 ) 


/ 

Vo 


/nr 

£ ^ 








ft 


^ Presented to The Institute of Mathematical Statistics, Dec. 27, 1938, at Detroit, Mich., 
as part of a paper, entitled ^'Remarks on two methods of sampling inspection." 

•J, V. Uspensky, Introduction to Mathematical Prohobility, McGraw-Hill, New York, 
1987, p. 178. 
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where the first symbol indicates the number of ways of choosing t — 1 of the 
specified items to fill the first n — 1 places, the second symbol indicates the 
number of ways of disposing of « — 1 specified items in the last m — n places, 
and the denominator gives the number of ways that the s items can be scattered 
through the lot. In order to get the average number of draws we multiply 
Vo by n and sum. Then we have 

/n\ ^ _ t(wi d" 1) Cn.lC'm— «,«— 1 _ f(ffl "b 1) 

T+T" C,n+1,.+1 « + i ■ 

Example 1. On a table of 200 bargain shirts there are 5 which have a 15 in, 
neckband and 35 in. sleeves. How many shirts must be examined, on the 
average, to find two of the desired kind? 

Solution. For this case, m = 100, s = 5, i = 2. Therefore n = [2(201)] 

6 = 67. Thus, an average of 67 shirts must be examined. 

Suppose mjc represents the Kih. moment about the mean, vk the Xth moment 
about the origin, and the moment relation given by 

(3) vi = (vi + X - 1)'*\ 

where + K — 1)^*’ represents the result of expanding {v + K ~ 1)'*’ and 
changing the exponent of v to the corresponding subscript. (For example, 
r* = C**! + 2)'” = vj + 3vj + 2vi .) It is easy to derive the recurrence relation 


(4) 


Vk 


a + K- l)(m + K) 
s + K 


Vjt-l. 


From this result the computation of the moments about the mean is theoretically 
direct. Actually the results do not seem to be very compact. The variance is 
given by 


(5) 


A»* = 


(m 4- l)(ni — s) 

(« + ms + 2) 


[i{8 + 1) - i\ 


In case s is unknown and n is known for a particular value of i, we may 

estimate 8, ( or rather — r^Y by using the relation, n = — - ■ . Then 

\ 8 + 1/ 8 + 1 

1 


n 


and the variance, using this estimate, is given by 


(7) 


Variance off ] - jest. = — r — 

\8 + 1 / n + i{ni 


1 


i(m + 1) i(m + 1) 


h-i! 

fi- ” 1 

U 'J 

L «»+ ij 


Example 2. In order to check a box of 144 screws, screws are drawn until 
10 good screws are obtained. In a particular case only 10 drawings were neces- 
sary. Estimate the number of good screws in the lot. 

Solution. Here m == 144, i = 10, w = 10. The estimate for a is obtained 
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from ®st- “ 1 0 ^ 145 ) “ might be expected, the conclusion 

is that all the screws are good. Furthermore the variance of the estimated 
quantity is zero. 

It is obvious that the number of draws necessary to obtain any particular 
number of specified items is correlated with the numbers of draws for lesser 
numbers of items. To investigate this, let us suppose that n,- represents the 
number of draws to obtam exactly j specified items and that x, == n,- — n/_i . 
It follows immediately from our previous results, that 

(8) F(xi) = E(x,) = Eix») = - 

This result could be obtained from the fact that, corresponding to any arrange- 
ment of the lot for which Xa = a and Xb = b, there is another arrangement 
where Xa — b and Xb = a, formed by moving a — 6 of the non-specified items 
from the first group to the second. From this fact we see, also, that 

(9) E(x\) = E(xl) = E(x\) = ■ . • . 

But xi = ni and al, = [« + 1 - H = ds. 

Therefore, 

(10) O'*, = = • • • = ds. 

But, from our previous formula we have 

o^ni = d(2s — 2), (Tn, = d(3s — 6), etc. 

Since n 2 — Xi + x% , it follows that 

O’nj ~ O’*, "f“ 2rx|,a;}0'*,O'x] “t” O’xj 

where r*, is the correlation between xi and Xi . Therefore, 

(11) = -1/s. 

Also, since Xi = wa — 0 : 2 , it follows that 

(12) r«„,, = ^ 

Likewise, from x* = nj — xi , we get 

(13) *■«„*, = ^ 2g ■ 

Finally, we obtain the three general results 


- /|/ 


t+1)’ 


(14) 
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(16) 

(16) 


'«<+ 


/e-i+1 

r « ’ 

- i/ *(«-^ 

■ r (f 


+ 1)(« - 1) 


Example 3. The cards of a deck are turned one by one until two aces have 
appeared. The second ace appears when the 36th card is turned. How many 
more cards should one expect to have to turn to fmd a third ace? 

Solution. Here m = 52, « = 4, i = 2, n* = 36. 

Then ik = 2.~, and = -/|/^ ^ 

Ox, = Vid and = V^. Since — = rn„z, we have 


Xz 


^ ~ = iZ 

5 \/6 6 \ 5 / 3 


Of course this result could have been obtained more directly by noting that 
there were two aces left among the 16 remaining cards. 


Conclusion. The results given in this note might be useful when it is neces- 
sary to estimate the number of items to be drawn in order to secure a desired 
number of a particular type, such as may be the case in obtaining a sample 
with previously defined characteristics. Also the note disproves such intuitive 
notions as the one that when looking for a desired record, one is most likely to 
have to search the whole pile to find it. As far as methods of sampling inspec- 
tion are concerned, the one implied in this note has little to recommend it. 

Carnboie Institute of Technology, 

Pittsburgh, Pa. 


RANK CORRELATION WHEN THERE ARE EQUAL VARUTES’ 

By Max A. Woodbury 
If there is given a set of number pairs 

(1) (X,,y,), ),..., (X^,F^), 

we may assign to each variate its ‘^rank’^ (i.e. one more than the number of 
corresponding variates in the set greater than the given variate). In this way 
there is obtained a set of pairs of ranks 

(2) (a:i , yi), (xj (xn , Vn). 

^Presented at the fall meeting, Mich, section of the Math. Assn, of America, Nov. 18, 
1939, Kalamazoo College. 
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If we assume that Xt ^ Xj and ^ 7/ when i j then it follows that 
each integer from 1 to appears once and only once in the x’s and the same 
holds for the y’s. This leads at once to the formulas: 

(3a) £ Xi = £ Vi * 2^ f = N(N + l)/2, 

<-i i-i <-i 

(3b) i:*? = Zi/j = i:»»-Ar(jv+i)(2Ar+i)/6. 

i-l «-l <-l 

When these results are substituted in the expression for the product moment 
correlation coefficient we have after simplifying [1], 

(4) p = 1 - 6 £ D^/NiN^ - 1) where Di^Xi-yt. 

If we consider the case of equal variates and follow the rule for assigning 
ranks given in the first paragraph, the resulting method is known as the bracket- 
rank method. The use of (4) in the calculation of p by this method is not 
strictly valid, because not every integer appears in the summations and so 
neither (3a) nor (3b) is true. 

The more accurate mid-rank method assigns to each of the equal variates 
the average of the ranks that would be assigned if we were to give them an 
arbitrary order. This method preserves (3a) but not (3b). In this paper pjf 
indicates the value of p as calculated by (4) when the mid-rank method is used. 

In a method due to DuBois [2], the equal variates are assigned the same rank 
so as to satisfy (3b). In this case (3a) is not satisfied. 

If we assign the ranks to the eqilal variates in an arbitrary way, then (3a) 
and (3b) arc of course satisfied and the use of (4) is valid. There are two 
disadvantages to such a method; first, the equal variates arc treated differently, 
and second, the assignment of ranks is arbitrary. These difficulties arc removed 
if one uses the average of the values of p corresponding to all possible ways of 
arbitrarily assigning ranks to the equal variates. Since p is linear in £ D? the 

i 

average value of p may be obtained from the average; value of £ D* and the use 
of (4). 

Ijet us first consider the simple case of two equal variates in one of the vari- 
ables, say X. It is clear that there are only two possible ways of assigning 
ranks, and that if we arrange the series by the assigned x ranks, the resulting 
series differ only in the y ranks corresponding to the equal X variates. If we 
denote the two x ranks to be assigned by m and m 4- 1 and the y’scorr^iponding 
for a particular arrangement by ym and ym+i we have for the average £ D* the 

expression 

m-l *r 

£ (a: - y.)* + £ (* - y*)* 

+ il(»n - yJ* -I- (to -f 1 - y«+i)* -f (to - y»+i)* -f (to -b 1 - y*)*]. 


(6a) 
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By the mid-rank method the corresponding expression is 

(5b) Z) ~ y*y + 2 (* ~ y*)* + (»»+§ — J/m)* + (to + i — Vm+if. 

»— m4-2 

The correction to be added to the mid-rank ^ Dl to get the average 2 •O* is, 

i i 

by subtracting (6b) from (5a) and simplifying, 

(6) Aj = 

To get Ak in the more general case of several equal variates, wo need only con- 
sider the difference between the average value of 2 and that obtained by the 

t 

mid-rank method. If there are K equal X variates we may assign the ranks 
in K\ ways, tliis results in K\ permutations of the y ranks for the sets arranged 
in order of their assigned x ranks. In {K — 1)! permutations ym-^j eorresponds 

JV 

to the X rank of m + so that the correction to the mid-rank X) Dj is 


Aic = 


(X-D! 


(7) 


K\ 


. If -i ir-i / 1 \2 

-IS 2 {W- i ymi j) ] f wi *4" - ym-^jj 

y-O t-O ;»‘0 \ ^ / 


= ^ ]C 2 [(w + t + ^ 2 * “ ^’"+') 


KiK^ - 1) 

12 


It is to be noticed that the correction is positive and d(»]Kinds only on the numbi*r 
of equal X variates. From this it can be concluded that for more than one 
group of equal variates no matter whether X^s or FV w('- can obtain the average 
2 D] by computing a correction for each group and then adding these correc- 

t 

tions to get the total correction to the mid-rank 2 • Then as before noted 

t 

we can by (4) calculate the average p (denoted as p). 

This correction to Z< Dl may be converted into a correction to pu • That is 


if iif.Kf 

( 8 ) 


6A,, 

- 1 ) 


KdK^i - 1) 

2iV(JV* - 1) ’ 


then 


j5 = Pji — 2 


.*< . 


where the summation extends over all groups of equal variates, and Ki is the 
number of equal variates in the fth group. 

A table of Skk for different values of N and K is given, and also a table of 
Alt • The values A* are given in the top row of the table, while the are 
given in the rows below. 
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Table of Ax and 5j^jr 

\ « I 



2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

Ak 

\ 

0.5000 

2.000 

5 

10 

17.5 

28 

42 

60 

82.5 

no 

143 

182 














3 

1250 

— 

— 



— 

— 

— 

— 

--- 


— 

4 

0500 

2000 



— 

— 

— 

— 

— 

— 


— 

5 

0250 

1000 

2500 


— 

— 

— 

— 


— 

— 

— ■ 

6 

0143 

0571 

1429 

2857 

— 





. — 





, — 

— 

7 

0089 

0357 

0893 

1786 

3125 

— 


— 

— 



— 

8 

0060 

0238 

0595 

1190 

2083 

3333 

— 

— 



— 


9 

1 0042 

0166 

0417 

0833 

1458 

2333 

3500 


— 


— 

— 

10 

0030 

0121 

0303 

0606 

1061 

1697 

2546 

3636 

— 

— 

— 

— 

IWW 

11 

0023 

0091 

0227 

0455 

0795 

1273 

1909 

2727 

3750 

— 


— 

12 

0017 

0070 

0175 

0350 

0612 

0979 

1469 

2098 

2885 

3846 



13 

0014 

0055 

0137 

0275 

0480 

0769 

1154 

1648 

2266 

3022 

3929 

— 

14 

0011 

0044 

0110 

0220 

0385 

0615 

0923 

1319 

1813 

2418 

3143 

4000 

15 

0009 

0036 

0089 

0179 

0313 

0500 

0750 

1071 

1473 

1964 

2554 

3250 

16 

0007 

0029 

0074 

0147 

0257 

0412 

0618 

0882 

1213 

1618 

2103 

2676 

17 

0006 

0025 

0061 

0123 

0214 

0343 

0515 

0735 

1011 

1348 

1752 

2230 

18 

0(K)5 

0021 

0052 

0103 

0181 

0289 

0433 

0619 

0851 

1135 

1476 

1878 

19 

0004 

0018 

0044 

0088 

0154 

0246 

0368 

0526 

0724 

0965 

1254 

1596 

20 

0004 

0015 

0038 

0075 

0132 

0211 

0316 

0451 

0620 

0827 

1075 

1368 

21 

0003 

0013 

0032 

0065 

0114 

0182 

0273 

0390 

0536 

0714 

0929 

1182 

22 

0003 

0011 

0028 

0056 

0099 

0158 

0237 

0339 

0466 

0621 

0807 

1028 

23 

0002 

0010 

0025 

0049 

0086 

0138 

0208 

0296 

0408 

0543 

0708 

0899 

24 

0002 

0009 

0022 

0043 

0076 

0122 

0183 

9261 

0359 

0478 

0622 

0791 

25 

0002 

0008 

0019 

0038 

0067 

0108 

0162 

0231 

0317 

0423 

0550 

0700 

26 

0002 

0007 

0017 

0034 

0060 

0096 

0144 

0205 

0282 

0376 

0489 

0622 

27 

0002 

0006 

0015 

0031 

0053 

0085 

0128 

0183 

0252 

0336 

0437 

0556 

28 

0001 

0005 

0014 

0027 

0048 

0077 

0115 

0164 

0226 

0301 

0391 

0498 

29 

0001 

0005 

0012 

0025 

0043 

0069 

0103 

0148 

0203 

0271 

0352 

0448 

30 

0001 

0004 

0011 

0022 

0039 

0062 

0093 

0133 

0184 

0245 

0318 

0405 

35 

0001 

0003 

0007 

0014 

0025 

0039 

0059 

0084 

0116 

0154 

0200 

0255 

40 

0000 

0002 

0005 

0009 

0016 

0026 

0039 

0056 

0077 

oias 

0134 

0171 

45 

0000 

0001 

0003 

0007 

0012 

0018 

0028 

0040 

0054 

0072 

0094 

0120 

50 

0000 

0001 

0002 

0004 

0007 

0011 

0016 

0023 

0032 

0043 

0055 

0070 

60 

0000 

0001 

0001 

0003 

0005 

0008 

0012 

0017 

0023 

0031 

0040 

0051 

70 

0000 

0000 

0001 

0002 

0003 

0005 

0007 

0010 

0014 

0019 

0025 

0032 

80 

0000 

0000 

0001 

0001 

0002 

0003 

0005 

0007 

0010 

0013 

0017 

0021 

90 

0000 

0000 

0000 

0001 

0001 

0002 

0003 

0005 

0007 

0009 

0012 

0015 

100 

0000 

0000 

0000 

0000 

0001 

0002 

0003 

0004 

0005 

0007 

0009 

0011 
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As an example of the use of the table we will consider the following problem, 
[2, p. 56], with the ranks assigned as for the mid-rank method. 


Subject I II 


A 1 2.5 

B 4 10 

C 4 2.5 

D 4 5 

E 4 7 

F 4 2.5 

G 7 8 

H 8 2.5 

I 9.5 6 

J 9.5 12 

K 11 11 

L 13 13 

M 13 9 

N 13 14 


We know that p 


For the mid-rank method we have 

14 

E = 119.5, N « 14, 

i-l 


= 1 — 

6(119.5) 
14(196 -I) 

= 0.7374. 

Referring to the table we find that 

Ki 

^Ki 

BsKi 

2 

0.5 

0.0011 

3 

2.0 

0.0044 

4 

5.0 

0.0110 

5 

10.0 

0.0220 

Total 

17.5 

0.0385 


= 0.6989 and in terms of 5jvx, 


_ 6(119.5 + 17.5) 

14(196 - 1) 

p = 0.7374 - 0.0385 = 0.6989 


The value given by DuBois for his method is 0.751 1 . 


Conclusion. A method has been developed for the treatment of rank correla- 
tion where there are groups of equal variates. The method consists of applying 
a generally small correction to the value as ordinarily calculated by the mid- 
rank method in order to find the value which would be obtained by averaging 
the values of the rank correlation coefficient for all possible ways of arbitrarily 
assigning ranks to the equal variates. Thanks are due Professor P. S. Dwyer, 
without whose aid and encouragement this paper would not have Ijeen written. 
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NOTE ON THBORBnCAL AND OBSERVED DISTRlBimONS OF 
RBPBimVE OCCURRENCES 

By P. S. Olmstbad 

1. A simple problem of r^etitive occunences. Two questions which the 
engineer often desires to answer whenever he has a new type of apparatus or a 
new design of an old tsrpe of apparatus are: How many times will it perform 
its intended function without failure? and How many times will it fail to perform 
its intended function in a given length of time? To do this, he selects a number 
of what he believes to be identical units of the apparatus and gives each unit a 
performance test under a uniform test procedure. The number of satisfactory 
operations prior to the first observed failure to perform this operation is called 
a “run” and is a measure of the type desired for each imit. 

If it is assumed that the probability of failure at any operation is a constant, q, 
and the probability of satisfactory operation is 1 — g or p, then the mathe- 
matical probability of runs of 0, 1, 2, 3 • • • satisfactory operations for any 
unit are 

(1) 9i P9, v\ P*?. • • • 

respectively. 

Jjet X denote the number of satisfactory operations in any run. The mean 
value of X, say to, , is given by 

(2) TO, = ^ . 


The variance of x is 



The first step in practice is to determine whether there exists a constant 
probability, p, by means of the application of the operation of statistical con- 
trol. ‘ Expressions (1), (2), and (3) provide the necessary information for doing 
this. When a constant probability exists as evidenced by at least 26 consecu- 
tive samples of 4 units each the following practical procedure has been found 
to be satisfactory. 

1. An estimate of p (or g), the sole parameter of the distribution, can be 
obtained from the average length of run in the sample. If p is less than 0.6 
and if the sample size is large, a reasonably good estimate of p can be obtained 
from the proportion of the sample having runs of zero length. 

2. The probability of getting runs of length x or more is p*. Thus, if a 
minimum (or maximum) value of the probability, p*, is chosen, a maximum 

> W. A. Shewhart, "Slatiatical Method from the VietepoitU of Quality Control," The De- 
partment of Agriculture Graduate School, Washington, 1939, Chapter I. 
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(or minimum) expected length of run can be computed for use as a criterion 
for looking for assignable causes of variation in the' length of individual runs 
by using the estimated value of p. 

3. The average and standard deviation to be used in calculating the limits 
to be applied to successive samples of rational subgroups in accordance with 
the Shewhart* Criterion I are given by Equations (2) and (3) in which the 
estimates of p and q are substituted. 

2. Application to a signal transmission problenL The theoretical solution 
given above is a direct answer to the first question at the head of this note. 

TABLE I 

Observed distributions of runs of x occurrences of event E for various test periods of 

apparatus life 


No. of 
Occurrences 
per Period 

Freq. 





Test Period 





1 

2 

3 

4 

5 

6 

7 

8 

n 

15 

X 












0 

7lo 

878 

1519 

961 

723 

541 

407 

343 

266 

160 

77 

1 

rii 

77 

226 

207 

206 

171 

148 

129 

97 

70 

35 

2 

th 

2 

31 

44 

55 

68 

46 

52 

39 

37 

27 

3 

nz 

1 

3 

8 

18 

15 

19 

13 

22 

19 

10 

4 

n4 


2 

1 

2 

— 

6 

5 

5 

7 

3 

5 

Wb 



— 

1 

1 

3 

1 

1 

5 

2 

6 

ne 



1 



1 


— 

1 

2 

7 

ni 








1 

— 

— 

8 

n$ 









2 

1 

Sample 


« 










Size 

n 

958 

1781 

1222 

1006 

796 

630 

543 

431 

301 

157 


The second question is also of interest particularly when failure to perform an 
operation does not impair the apparatus unit for performance of additional 
operations. In cases of this type, the engineer often lets his test continue for 
test periods of particular lengths, measured in numbers of operations or some- 
times in intervals of time (i.e., time intervals are often considered to be propor- 
tional to numbers of operations) and observes the number of failures during the 
test period for each imit. Thus, he may, after he has assured himself that 
control exists, arrange his data for each test period to show the frequency of 
occurrence of 0, 1, 2, 3, • • • failures per unit. 

Data of this type which arc typical of those found in other studies made 


*Loc. cit. 
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during the past two years are presented in Table I. These were obtained in a 
signal transmission study in which the data for successive periods were obtained 


TABLE II 

Comparison of observed and theoretical values of averages and variances for 

distributions of Table I 


Statistioor 


Test Period 

Parameter 


1 

2 „ 

8 

4 

fi 

6 

7 

j 

8 

n 

15 


observed 

.916 

1 

.853 

.786 

.719 

.679 

.646 

i 

.632 

.617 

.532 

.491 


observed 

.098 

.171 

.269 

.381 

.448 

.543 

.537 

.633 

.917 

1.026 

V 

theoretical* 

.091 

.172 

.272 

.390 

.471 

.548 

.583 

.620 

.881 

1.039 

observed 

.091 

.200 

.343 

.497 

.556 

.832 

.760 

1.075 

1.783 

1.921 

2 V 
* ? 

theoretical* 

.098 

.202 

.346 

1 

.542 

.693 

.848 

.924 

1.005 

1.6582.117 


* Based on assumption that $ is the true value of q. 


TABLE III 

Theoretical distributions corresponding to distributions of Table I calculaled by 


using q ^ as the true value of q 
n 


No. of 
Ooourrenoes 
per Period 

Freq. 

1 

2 

3 

4 

T«et Period 

5 6 




15 

X 












0 

no* 

878.0 

1519.0 

961.0 

723.0 

541.0 

40f.0 

343.0 

266.0 

160.0 

77.0 

1 

ni 

73.3 

233.5 

205.3 

1 202.8 

173.3 

144.1 

126.4 

101.9 

74.9 

39.2 

2 

»j 

1 6.1 

32.9 

43.8 

1 56.9 

55.5 

51.0 

46.6 

39.0 

35.1 

20.0 

3 

ng 

.5 

4.8 

9.4 

16.0 

17.8 

18.0 

17.1 

14.9 

16.5 

10.2 

4 

fii 

.1 

.7 

2.0 

4,5 

5.7 

6.4 

6.3 

6.7! 

7.7 

5.2 

5 

ni 


.1 

.4 

1.3 

1.8 

2.3 

2.3 

2.2' 

3.6 

2.6 

6 

no 



.1 

.4 

.6 

.8 

.9 

.8 

1.7 

1.4 

7 

n? 




.1 

.2 

.3 

.3 

.3 

.8 

.7 

8 

ng 





.1 

.1 

.1 

.1 

.4 

.3 

9 or over 

no.« 








.1 

.31 

.4 

Sample 












Sise 

n* 

958 

1781^ 

1222 

1005 

796 

630 

543! 

431 

301 

157 


* The observed values of no and n form the basis for the calculated distributions. 


for separate units. Since each set of these data passed the scrutiny for control, 
there is justification for assuming that a statistical universe exists and that its 
functional form may be derived from the observed distribution. It was found 
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that these data were consistent with the assumption that, where the probability 
of non-occurrence of a failure on a unit in the test period was 9, the probability 
of exactly x failures on a unit was p’q. This set of mathematical probabilities 
is shown in (1) with 7 redefined to apply in this case to non-occurrence of a 
failure. 

Observed and “Theoretical” values of the averages and variances for the 
observed distributions are shown in Table II. The basis for calculating the 
theoretical values was to take the ratio (designated 7) of no to n for each distri- 
bution as the estimate of the true value, 7. Distributions as shown in Table III 

TABLE IV 

TeM of Jit of theoretical to observed distriJmtiom {Table III and Table /, respectively) 


Tost Period 



1 

2 

3 

. . 

4 

5 

6 

^ 1 

8 

11 

15 


2.24 

0.20 

0.32 

2.09 

9.79 

0.65 

3.201 

6.27 

1.07 

3.98 

Degrees of 
Freedom 

1 

2 

1 

2 

i 

i 

3 

3 

3 

3 

3 

4 

4 

Px* 

.13 

.90 

.87 

.56 

.02 

.87 

.36) 

.10 

.90 

.41 


* Minimum number in cell for theoretical distribution taken as 5. 


were calculated from each 7. These distributions were tested against the ob 
served distributions by means of the x* test with the results shown in Table IV, 
which are all within reasonable limits of what might be expected when a con- 
stant probability exists. 

3. Conclusions. When a constant probability applies to each operation in a 
repetitive process this note shows how to establish criteria for identifying signifi- 
cantly long or short lengths for individual runs and significantly high or low 
average lengths for groups of several runs. A problem taken from the field of 
signal transmission gives assurance of the existence of this type of distribution 
in practice. 

Belt. Telephone LABORATOBiBa, 

New York, N. Y. 



THE DISTRIBUTION THEORY OF RUNS 
By a. M. Mood 

1. Introduction. In stud 3 dng a particular sample, the order in which the 
elements of the sample were drawn is frequently available to the statistician. 
This important information is usually entirely neglected by him. Such dis- 
regard must be attributed, to a considerable extent, to the unsatisfactory state 
of mathematical devices for using the knowledge in question. One reasonable 
mathematical method for handling this information, the one to be used in this 
paper, is to make use of the distribution of runs. A run is defined as a succession 
of similar events proceeded and succeeded by different events; the number of 
elements in a run will be referred to as its length. 

The distribution theory of runs has had a stormy career. The theory seems 
to have been started toward the end of the nineteenth century rather than in the 
days of Laplace when there was so much interest in games of chance. In 1897 
Karl Pearson [1], in a discussion of data taken from the roulette tables at Monte 
Carlo, wrote . . the theory of runs is a very simple one.^^ In this book he 
developed no theory but it is evident from his computations that he regarded the 
distribution of runs as a special case of the multinomial distribution. The 
multinomial method, besides evading the issue somewhat and raising questions 
of random sampling, also gives incorrect results when one is interested in runs 
of more than one kind of element. In 1899 Karl Marbe [2] derived an expression 
for the mean of the number of iterations of a given length from a binomial 
population. This result was incorrect because he neglected dependence between 
overlapping iterations. An iteration is defined as a sequence of similar events; a 
run of length t is counted as i — « + 1 iterations of length s for 5 < t. Marbe 
has assembled a great mass of data with the object of proving the popular 
hypothesis that a ^‘head^^ becomes highly probable after a long succession of 
^^tails'^ has appeared. Ordinary significance tests applied to his data do not 
support this contention, but Marbe continues to advocate it [3] and [5]. Of 
course, he has been severely criticised by many mathematical statisticians. 

In 1904 Griinbaum [6] derived the mean of the number of runs of given length 
from a binomial population by the multinomial method. The first correct 
formulae were derived in 1906 by Bruns [7] who found the mean and variance of 
the number of iterations of given length in samples from a binomial population. 
In a book published in 1917 von Bortkiewicz correctly derived for the first time 
the mean and variance of nms from a binomial population using a method similar 
to that of Bruns. This book [8] contains a great many formulae for means and 
variances of runs and iterations under various special circumstances; a large 
portion of it is devoted to an exhaustive criticism of Marbe’s work. In 1921 von 

367 
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Mises [9] showed that the number of long runs of given length was approximately 
distributed according to the Poisson law for large samples. 

It was not until 1925 (so far as the author has been able to ascertain) that an 
actual distribution function appeared when Ising [10] gave the number of ways of 
obtaining a given total number of runs (without regard to length) from arrange- 
ments of two kinds of elements. Stevens [12] in 1939 published the same dis- 
tribution and described a x* criterion for significance. Wald and Wolfowitz [13] 
in 1940 published the same distribution and showed that it was asymptotically 
normal. These papers are all concerned with random arrangements of a fixed 
number of elements of each of two kinds; the last mentioned paper describes a 
very interesting application of the distribution to the problem of testing the 
hypothesis that two samples have come from the same continuous distribution. 
Wishart and Hirshfeld [11] in 1936 derived the distribution of the total number of 
runs (again without regard to length) in samples from a binomial population and 
showed it was asymptotically normal. 

In tliis paper we shall derive distributions of runs of given length both from 
random arrangements of fixed numbers of elements of two or more kinds, and 
from binomial and multinomial populations. Also we shall give the limiting 
form of these distributions as the sample size increases. These limiting dis- 
tributions are all normal. The distribution problem is, of course, a combina- 
torial one, and the whole development depends on some identities in combinatory 
analysis, — some new and some well known to students of partition theory. 

The paper will be divided into two parts. The first will deal with distribu- 
tions obtained from random arrangements of a fixed number of each kind of 
element. The second will deal with distributions of elements from a binomial 
or multinomial population. 


Past I 

2. Distribution of runs of two kinds of elements. Consider random arrange- 
ments of n elements of two kinds, for example o’s and n* Vs with ni -|- n* = n. 
Let ru denote the number of runs of o’s of length i, and let ru denote the number 
of runs of 6’s of length i. For example the arrangement 

abbabaaabbaaa 

will be characterized by the numbers rn = 2, ru = 2, rji = 1, r 22 = 2, and all 
other ru = 0. Also we let n = 2 r* = 2 denote the total number of 

i % 

runs of o’s and Vs respectively. Throughout the paper a binomial coeflBcient 
will be denoted by 

(:) - ra 


and this is defined to be zero when m < k. A multinomial coefficient will often 
be denoted by 
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ml 

mjlms! • • • m,l 


(2.3) 2m< = m, m< > 0 

and when such a coefficient is to be summed over the indices m< the two condi- 
tions (2.3) are always understood and will not be repeated; other conditions on 
the indices will be placed below the summation sign. 

Given a set of numbers r<,- (t = 1, 2; ^' = 1, 2, • • • , n.) such that ]C P’a = > 


there are 


■’■‘land hi 

L»’id 


different arrangements of the runs of a’s and b’s respec- 


tively. Hence the total number of ways of obtaining the set r,-,- is 


muf) 


[:;]M 


Fin , n) 


where Fin , n) is the number of ways of arranging ri objects of one kind and n 
objects of another so that no two adjacent objects are of the same kind. Thus 


Fin ,n) = 0 if I ri — r* I > 1, 
= 1 if I ri - rj 1 = 1, 
= 2 if ri = r* 


Since there are possible arrangements of the a’s and b’s, we have at once the 
distribution of the r*/ 


Pin,) = 




Certain marginal distributions will also be of interest. To obtain, for example, 
the distribution of the m , it is first necessary to sum over all partitions of 
n* . This is easily accomplished by finding the coefficient of in 


ix + X* + X* + . • = *'^*(1 + a: + »* + --’Y 


il-x)'* 




The term- corresponding to < = ti* — r* gives the desired result: 
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We have then 


P(r„,rO. L’'«JVr.-i; 

(:) 


and summing this over , a slight simplification gives 


cr' 


The distribution (2.6) summed over n, and rj, gives by means of (2.7) 


( 2 . 10 ) 


P(ri , rj) = 




which is essentially the distribution derived by Wald and Wolfowitz [13], and 
summing this over rj we get the distribution discussed by Stevens [12] 


( 2 . 11 ) 


P(ri) = 


(rix::') 

(:) 


Another marginal distribution which will Ije useful is obtained by summing 
(2.9) over ru for i > k. If we let 

«v = »’v, j<k, 

A;— 1 

8u = 5Jry, A = 

k 1 

we must then sum the multinomial coefficient 


rub! • • • nnj 

over all partitions of ni — ^ such that every part is greater than A; — 1. This 
is given by the coefficient of in 

(/ + + . . .)•“ = x**‘* Z ^ ^ x' 

(-0 \ Si* ~ 1 / 

thus we have 

(2.12) Z(« ~i 

n*! • • • Tini \ ““ 1 / 


n*! • • • Tini 
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where 22(« denotes summation over all positive integers n* , n *+ 1 , 
such that ^jrif = iii — A. This identity with (2.9) gives 

k 


(2.13) P(«„) = 


r«i‘ /nj 4 - iWni - 4 - (fc — l)8u - l\ 

L«uJ \ 81 A - 1 / 


t — Ij 2| • • • I fc. 


Another useful distribution analogous to (2.13) is derived by considering runs 
of both kinds of elements. If we define «», (j = 1, 2, • • • , h) and B in terms of 
r*/ just as «i< and A were defined above, it follows at once from (2.6) and (2.12) 
that 


(2.14) Pisu,st,) 

[:][:]("■; 


A {k l)SlA: 

SlJb “■ 1 


- B - (A - l)«iA 

^ Bih- 1 




i = 1, 2, ••• , k;j = 1, 2, , A 

These last two distributions should be the most useful for applications. The 
long runs have been added together to form the new variables au and sa* thus 
decreasing materially the number of variables as compared with (2.6) and (2.9) 
while at the same time little information is lost. One is free to choose k and h 
so that the number of variables is appropriate for the data at hand. Moreover, 
it is shown in Section 5 that these variables are asymptotically normally distrib- 
uted so that one may apply a simple x test of significance for ^‘randomness of 
elements with respect to order^^ when dealing with large samples. We shall 
then be able to test whether a sample has been “randomly'^ drawn in a certain 


3. Moments for runs of two kinds of elements. Instead of dealing with the 
ordinary moments we shall obtain formulae for the factorial moments because 
the expressions are much more compact. As is customary, a factorial will be 
denoted by 

(3.1) = x(x — l){x — 2) • • • (x — a + 1), 

and is defined to be 1. Of course the ordinary moments are determined by 
the factorial moments by means of relations of the type 

i-O 

A recent discussion of the coefficients C< has been given by Joseph [14]. The 
mathematical expectation of a function /(r) will be denoted by 
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E(J{r)) - Z/WPW. 


Of course E is a linear operator. We shall require the following identity 




ni - X) *o< - l\ 

, n - X) “ 1 / 


where 2(i) denotes summation over all positive integers rn , ru , • • . , ri», such 

that 2^ iru = ni . (3.3) may be verified by differentiating 
1 

“ (<i® + ^ + • • • 

o,- times with respect to << (t = 1, 2, • • • , nO, then finding the coefficient of 
a:"* after putting U = 1. The identity (3.3) enables us to find the factorial 
moments of the variables in the distribution (2.9) for we have 






/ni - 22 ^ - A /«* + 1\ //”\ 

Vn - 22 o* - 1 / \ I'l ) ' W/ 




= (n* + !)«“'> 


\ ri - 22 Oi ” 

(n - 2!) (t + l)o< 

^ ni - 22 


The siun on ri involved in the last step is given by the identity 




which is readily obtained by equating coefficients of in 


(1 + x) 




(1 + xY 


We shall give here the means, variances and covariances obtained from (3.4) 
(3.6) = (n,+ l)®n{*Vn“'^", 

,, n,‘«(«, + n|(n* + l)*n{«n{'' 




n^‘>(n* + l)«»nl«> (n,+ l)<%i« 




+ !)<%{'» 
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These will be needed in the section dealing with asymptotic distributions. The 
moments for the distribution (2.6) follow at once from (3.3) as 


(3.9) 


a ri,fj 


ni- J^ioi - l\/nt - 52 a ■■ 
ji-'EtOf- 1 / -Hh- 



The summation on r* is accomplished by putting r* = ri — 1, n , and ri + 1, 
but after that has been done it is necessary to expand the product of the two 
factorial factors in factorial powers of the lower index of one of the binomial 
coefficients. This is easily done for the first few moments, but there appears 
to be no simple expression for the general case. The means, variances and 
covariances of ru are given by (3.6), (3.7) and (3.8) and those of rj< are obtained 
from these equations by interchanging ni and nj . The other covariances are 


(3.10) 


„(«•+» „<y+2)* 

Wi fh j, »i Wj 


+ 2 


Til 712 
fid’ll) 


(n, + l)<”(n» + 


A slight variation of the method above will give the moments of the «i» in 
the distribution (2.13). An accent on a summation sign will indicate that the 
term corresponding to i = A; is to be omitted. Ditferentiating 

= [t\X + + • • • + th^lX^ ^ + tk{x^ + + • • • 


a< times with respect to U and finding the coefficient of a;”' after putting = 1, 
we obtain 


(3.11) 


s 

'Uu-^ 1 


n«ii 


(<><) 


[:](’ 


ni — A — (A; — l)«u “ 


^1* 


0 


‘ V / 


This with (2.13) gives by the same steps as used in obtaining (3.4) 


(5.12) ^(n .!:•■) = (». + «-(” 

The first two moments are 

(3.13) BM 


(3.14) 


nS(n, + l)n{'+« _ n,(n, + l)*n{‘>ni*> 

^(i+jfc+D 
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(3.15) 


_ («2 + l)**'ni**’ (nj + l)ni 
n'**’ n<*> 


f /, _ (n, + Dn^ 
\ n»> /• 


The others are, of course, given by (3.6), (3.7) and (3.8). 

The joint moments of the variables in (2.14) as obtained from (3.11) are 


(3.16) 


E (n •ir’C’) - £ “* *) 

it *|.*j \ 8 i-2^0<-1 / 


/nj - ^jb, + 6* - I 

\ St - bj - 1 

In addition to the covariances (3.10) we shall need 


F(.Si, 8t) 




„«+S)„0-+l) 1 o«, <*+*)*,<>+» *,<*+*> *,w> J. 

Ul 712 + ^ni 712 I O ^2 “r ^2 






(3.18) 


Til ^2 10^1 ^2 V^l I 1 KW 2 + ^2 


1) 




The moments of r in the distribution (2.11) may be derived easily by means 
of (3.5) as 

(3.19) 


From which 


(3.20) 


E{n) = 


(ns + l)ni 


(3.21) 


2 _ (ns -t- l)‘^*ni^* 
nn®^ 


4. Distribution and moments of runs of k kinds of elements. This section 
is a generalization of the proceeding two sections to several kinds of elements. 
The case k = 2 was treated separately because the special character of the 
fimction Fin , rs) in this instance made the distribution comparatively simple. 
Now we shall be interested in k kinds of elements denoted by Oi , • • • , o* and 
we shall suppose there are n< elements of the ith kind. We let r„- denote the 
number of runs of elements of the fth kind of length j, and put 

ic n i 

ri = 2n,. 

1 >-1 

The same argument as was used in deriving (2.6) gives 
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(4.1) 



where the function F{ri , rt , • ■ ■ , rt), which will be referred to hereafter simply 
as F(ri), represents the number of different arrangements of ri objects of one 
kind, r* objects of a second kind, and so forth, such that no two adjacent objects 
are of the same kind. We shall be able to give the explicit expression for F{r<) 
after examining the marginal distribution Piu). This is obtained by s ummin g 
(4.1) over r, with r,,- fixed by means of the identity (2.7) giving 



Despite our present meager knowledge of F(r,) it is possible to find the 
moments of the as distributed by (4.2). Since 52 Pi^i) = 1, we have the 

'i 

identity 

(4.3) 

From this the moments are easily derived. If we put 

(4.4) Ui = Ui — r< 
we have 

£ n n : J) nr.) - r n (», - ro-' n ("; : J) 

= £ n (». - 1)'“’ n f *) to 

= n (». - 1)'"’ £ n i" ') TO 

- n (». - 1)'-" F" ■ ^ *1 

i-i L J 


The summation involved in the last step is given by (4.3). On dividing the 
last equation by we get the factorial moments of the Ui 

(4.5) ® (n »)''■) - n («, - «'•- [V-ll/tJ- 


From these equations the moments of the r< may be found; the means, variances 
and covariances are 
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Uiin — n< + 1) 


_ nfnf 


nf\n — Ui + 1)^ 


It is clear that 


ip{Q = Coefficient of IJ in 
1 


(xi + . • • + Xfc)* n (2^1 + • • • + Xi^i + uxi + Xi+i + • • • + ^ j J 

is a generating function for the moments of the variables Ui . This generating 
function will enable us to find the exact expression for F(r») for we have 

k 

P{ui = n«) = Coefficient of IJ in (p{t^ 

- [*,]n [’“;']/[:]• 

Also 

and equating the expressions on the right of the last two equations we have 


(4.10) 


z r^lnK"'! 

jpu\ _ «<•»*/ L®*J L ^*7 J 

n(rO 

- ,S, [:]n[Vi‘] 


Zin'ii-Tf-ttf 

in which the prime on the n'ij indicates that the indices corresponding to j ’= i 
are to be omitted; hence i takes all the values 1, 2, • • • , A; and j takes all values 
1, 2, ■ • • , A; except i because the index tin has been cancelled with rii — u in 
the binomial coefficient in the denominator of (4.10). It is clear from (4.11) 
that F(u) may be expressed as follows 

F(r<) «= CT U x7^*(xi + . . . + x*)*(x 2 + x» + . . • + x*)'*“* 

(4.12) 1 

(xi + X| + • • • + **)’' * • • • (®1 + • • • + Xj-O"^* * 


in which “CT’* is an abbreviation for “constant term of.” 
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We are now in a position to obtain moments of the variables in the distribu- 
tion (4.1) by means of identities similiar to (4.3). As an illustration we compute 




n n (ici + • • • + uxi + • • • + X*) 


~ CT n xp'Cxi + • • • + xjb)’‘'^(xs 4* • • • + XkY 


Jl<-0 


R] 


(n — ni) 
n<*> 


(a) 


or 


(4.13) 

1 

The moments of r,, may be computed from identities of this type together with 
(3.3). The first two moments are 

(4.14) ^(n,) = (n - n< -I- l)®ni"/n‘'^ 

(4.15) - »<)”’(« - «< + l)®/n”^+* 

(4.16) E(r,yr«) = - ny)‘’>(n - n< -b 

EM = (n.- -3 1) { (n.- -j + l)«>(n. - 1 -|-1)‘* 


-f 2(n - n< - n,)(n< - j + l)(n, - t + l)(n, - < + n,- - j) 

+ (n — n.- — «.)^*’[(«* — < + 1)® + 2(n< — j + l)(n, — < 1) 

+ (fk - J + 1)'”] -b 2(n - tii - n,y*\ni - j + n, - t + 2) 

+ in-fk- n.y*'] -b 2(n.- -j-1) {(n.- - j + 1) 


(4.17) 


•(n, - t + 1)'®’ -b (n - nj — n,)[2(ni — j + l)(n, - < -b 1) 

-b («. — < + 1)'*’] (n ~ Tii — n,)'*’[2(n, — f + 1) + (». — J + 1)] 

+ (n - ny - n.)'"} + 2(n. - f - 1) {(n. - f + 1) 

• (n< — i + 1)^*^ -b (w — n< — nt)[2(n< — j ’ + l)(n. — f + 1) 

+ («< ~ i + 1)^**] + (n — n< — n.)'*’[2(n< — j -b 1) + (n. — f + 1)] 

-b (n — — «.)”’} + 4 — {(”< “ i + !)(«• — < + 1) 

-b (n - n< - n,)(n< - j -b ». - < + 2) -b (n - n< - n,)®}. 
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Such a lengthy expression as this last one can hardly be useful to the statistician, 
and for this reason we shall not define variables analogous to the 8u and sj,- 
of Section 2 and take the time and space to find their moments. 


6, Asymptotic distributions. We shall show that some of the distributions 
obtained previously are asymptotically normal when the n,- become large in 
such a way that the ratios n,/n remain fixed. The description ‘‘asymptotically 
normal” means that the distribution approaches the normal distribution imi- 
formly over any finite region as n* — ♦ « . The ratios rii/n will be denoted by 
Ct j hence S = 1. The symbol 0(1/77“) will represent any function such that 

Lim n®0 ( - ) = L < oo. 

n-»oo 

We shall not, of course, be able to get any limit theorems for distributions 
like (2.6) or (2.9) because the number of independent variables increases with 
n. We shall consider first the distribution (2.13) whose asymptotic character 
is given in the following theorem. 

Theorem 1. The variables 


( 6 . 1 ) 


Xi 


Xk 


% 2 

\/h 

““ 716*62 

Vn 


i < k 


are asymptotically normally distributed with zero means and variances and co- 
variances 


ffii = + 1)C/ + 1 ) 616 * - ijet - 2ei], i, j <k,i 9 ^ j 

6 ’i< ~ 61* *62[(i "I" 1)*6i62 — i*6* — 26 i] •+• 6162 , i K. k 
ffik = 6l''**~*62[(t 4* 1 )A»i 62 — tkCi — 61], i < fc 
ffick — 6 i*~* 62 [A:*( 6 i — l)et — ej + 6*62 . 


The limiting means, variances and covariances are obtained from the relations 
(3.6), (3.7), (3.8), (3.13), (3.14) and (3.15). 

To demonstrate this theorem we make the substitutions 


( 63 ) 


Hi — ne< 

8u = m\el + 

«u “ ne\et + y/nxk 

k 

«i = n6iCi + Vn S Xt 


t = 1,2 

i = 1 , 2 , 


. ,k-l 


k-l 


A — n(6i — e\ — keiCi) + \/n 2] ixi 

1 

in (2.13), and estimate the factorials by means of Stirling’s formula 
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(5.4) 


m! — ^2rni!^e " + 0 • 


The result is an unwieldy expression which we shall not present at the moment. 
First we note that the exponential factors cancel out because the sum of the 
lower indices of a binomial or multinomial coefficient is equal to the upper index. 
Also we simplify the expression by considering in detail only terms which involve 
the Xi ; the normalizing constant can be determined from the final limit function. 
Any function of the parameters will be represented by the letter K. Thus in 
(5.4) we need consider only the factor All factorials will be of the form 


(5.6) 


m = no + y/nL{x) + b 


where L(x) is a linear function of the Xi , and a and b are independent of n and 
Xi . Now 


(no + >AL(x) + 


/ T U \ wo+v'nL(*)-H!>+i 


\ a y/n an) 


= (l + ^ 

\ a yjn 



no+Vw 


and log = K y/ nL{xi log no + (no +jVni'(®) + 5 + §) 

® \ o Vn on/ 

= K + y/nL{x) log no + (no + y/nLix) + 5 + J) 

Aov^^on o*n ^ \n»«// 

= K + Vnl(»)(l + log «o) + ^ + 0 

so terms arising from b (and 6 + i in the exponent) will be neglected as^they 
give rise only to terms independent of the x< or of order 1/n*. Of course log 
(1 + 0(l/»»)) = 0(1 /to). Thus, keeping significant terms only, the result of the 
substitutions (5.3) and (5.4) in (2.13) after taking logarithms and using|(5.6) is 


-log P(n) = K + y/nT,Xi (log ne{4 + D + 2 

1 1 2eie2 

- Vn (log ncj + 1) + ~ x^ 
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(5.7) + y/n + (fc — (log nej ^ 

+ 2 Vwx* (log ncte 2 + 1) + Vn f 2 (log neV‘^ + 1) 

elet \ 1 / 

The coeflScients of Xi{i < k) and x» are 

•\/n(log + 1 — log ncj — 1 + 1 log m\ + i — i log tic*'*'* — ») ®= 0i 
\/n( — log ne| — 1 + A: log nc{ + A: + 2 log ne{e» + 2 — A: log ne*'*'* — A:) == 0. 
Hence only the quadratic terms remain and we have 

(5.8) 
where 

= 


(5.9) 


a 




-logP = K + 


1 , ijet 

A t +1 

i, j < ft. 

Cj Cl 


1 , 1 , i*c* 

t < ft. 

C 2 62 61 


1 t “f* i{k — 1)62 

€t ei 

i < ft. 


H _ 1 . 2 , ft* (ft - D* 

2 "r "S r "t+i ^ 

ej cje, cr 




It is merely a matter of straightforward multiplication of the two matrices to 
verify that 1 | «r*^ |1 is the inverse of || <nf 1 |, hence is a positive definite matrix. 
The details of the verification will be omitted. We have then 

(5.10) P = ^1+0 


In this equation K miist necessarily contain the factor 



because there 


are ft + 5 factorials in the denominator and 5 in the numerator of (2.13). 
Since Ar,- = 1 , this factor, in view of (5.1), may be replaced by nAx< , so 


(5.11) 





If we restrict the x,- to any finite region R in the x-space, the function 0(l/n*) 
approaches zero uniformly as n — » «. Thus, if are any positive 
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numbers such that the corresponding values of Xi , say at and , obtained by 
substituting Ai and Bi for n in (5.1), determine a rectangular region iZ'(a< < Xi < 
b{), which lies in i2 we have 

Z P(r<) = E 

( 6 . 12 ) 

Jr* 

by the definition of a definite integral and Riemann’s fundamental theorem. 

We have given some details of this proof in order that it may serve as a model 
for other theorems of a similiar nature which wDl appear later, and for which 
a complete proof will not be given. Two immediate consequences of Theorem 1 
will now be stated as corollaries. 

Corollary 1. The variable 



r-neiet 

3U ^ I.M.III M III- 

\/n€i€2 

where r is the total mmher of runs of one kind of element, is asymptotically normally 
distributed with zero mean and unit variance. The limiting mean and variance 
were computed from (3.20) and (3.21). 

Corollary 2. The variable Q = is asymptotically distributed accord- 
ing to the with k degrees of freedom. 

In exactly the same manner in which Theorem 1 was deduced from (2.13), 
we may prove the following theorem corresponding to the distribution (2.14). 
Theorem 2. The variables 


(5.13) 



i < k, 


i < A, 


are asymptotically normally distributed mtfi] zero means and variances and 
covariances 


(TxiXj = + l)(i + i)ciei — ije 2 — 2ei] 

ffttzt = + l)*Ciej - i*c, - 2ei] + 

CTxfti, = + l)ifcciC2 — t’fcej — Ci] 

O'**** = ej* ‘ej[— k^el — ej + ejcj , 


i, j < k, 
i < k, 
i < k, 
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(6.14) == e2^^^e\[(i + l){j + l)^i^ ~ — Zes] i^j < A, 

^viVi ~ ^ 2 * “■ 2^2] + elei i < A, 

<^x<y/ = + l){j + l)eie2 - 2ie2 - 2j6i + 4eie2 + 2] 

i < k,j < h, 

^•kVi ““ 2(k — 1)^2 {j — 1)^1 + 26162 ] j < h. 

These limiting variances were computed from the variances and covariances 
given in Section 3. We have chosen the variable S 2 a of (2.14) as the dependent 
variable. The proof of this theorem is omitted. From it the following corol- 
laries are deduced immediately. 

Corollary 3. If Ui = and Uk-^i = j/,- of (5.13) and \\a^ \\ (t, j = 1, 2, 
• • • , Ai + /i — 1) denotes the inverse of (5.14), then the variable Q = is 

asymptotically distributed according to the x^law with k + h ^ \ degrees of freedom. 

Corollary 4. If Si = su + Sa denotes the total number of runs of both kinds of 
elements of length f, and Su the total number of runs of length greater than — 1, 
then the variables 


«.• - n(ele| + eje?) ^ ^ ^ 

X% — —————— ^ K 


(5.15) 


Xk 


■\/n 

Sit — n(eiCi + e*ei) 

v« 


are asymptotically normally distributed with zero means and variances and 
covariances 


(5.16) Oij — + o’.jvj "f" ^rxiVi "I" ffwjvj • 

We have put h = kin Theorem 2 to obtain this result. The terms on the right 
of (5.16) are defined by (5.14); terms which do not appear there may be found 
by interchanging ci and et in one of the relations. For example is given by 
interchanging ei and c* in the fourth equation of the set (5.14). 

CoBOLLABY 5. The variable Q = where the are defined by (5.15) 

and II I) is the inverse of (5.16), is asymptotically distributed according to the 
X-law with k degrees of freedom. 

CoBOLLABT 6. If « denotes the total number of runs of both kinds of elements, 
then the variable 


X = * ~ 

2y/neiet 

is asymptotically normally distributed with zero mean and unit variance. This is 
the result derived by Wald and Wolfowitz [13]. 


6. Asymptotic distributions for k kinds of elements. We now invest^te the 
asymptotic character of the distribution (4.2) 
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Pir,) - 


i(rO^ 


where rt is the total number of runs of the fth kind of element. 

Theorem 1. If k > 2, the variables 

(6.2) X, - 

are asymptotically normally distributed with zero means and variances and 


covariances 


On = e?(l — c,)* 


The restriction A; > 2 is made because in the case A; = 2 the correlation between 
the two variables approaches one, and the numbers <r<,- are all equal. The result 
may be called a degenerate normal distribution and might be included in the 
theorem in this sense; we have chosen to omit it because this case is better taken 
care of by Corollary 1 of the previous section. 

The proof of this theorem will be simplified if in the moments (4.5) we replace 
the numbers n,- — 1 by n.- . This substitution will not, of course, affect the 
limiting moments. Hence we consider the variables r,- with moments given by 


«(«<) f” “ 

1 / I n 


and shall show that 


are asymptotically normally distributed with zero means and variances and co- 
variance (6.3). It is possible to prove this statement by showing that the 
characteristic function (Fourier transform) obtained by substitutmg for U 
in the moment generating function 

* 

<f>n(tt) “ Coef. of H in 


n (*i + • • • + Xi-i + Uxi + x<+i + . . . + XkY* /[:] 


approaches 
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86 n — > 00 . This method is not appropriate for proving a similiar theorem 
which appears in Part II, and we prefer to give here a demonstration that will 
suffice for both theorems. 

In order to prove our theorem we consider the general term in the coefficient 
of na:?‘ in (6.6) 

«?■■/[”] 

in which 

k 

(6.8) 2 mil = 


must be required as well as the usual restriction on indices of a multinomial 

k 

coefficient, 2 wi»f = • Therefore only {k — 1)“ of the indices are independent, 

y-i 

Clearly . Now without concerning ourselves about the statistical 

significance of the variables m,-, , let us consider their distribution 


(6.9) 




in which the variables correspondmg to the values i, j = 1,2, ■ ■ • , k — \ will 
be chosen as the independent ones. We shall now prove a theorem from 
which Theorem 1 follows immediately. 

Theorkm 2. The variables 


( 6 . 10 ) 


_ niij — neiSj 

\/n 


i,j = 1,2, •••,k — 1 


are asymptotically normally distributed vrith zero means and variances and co- 
variances given by 

^ijkPQ “ , 

( 6 . 11 ) ~ ““ ®»(1 ^i)^j^p f 

(Tii.ti - e,«,(l - c,)(l - e,). 

First it is to be noted that the moments of the m,-,- are easily obtained from the 
identity 


( 6 . 12 ) 
as follows 


E n -»!?'> n 1 1 - £ n «!"•»’ n ? “"I 

a i ^ \_mii — Oij J 
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and on dividing this last relation by J we obtain 
(6.13) »»«"’) - n n 

{.i { » 

from which the moments (6.11) and the means in (6.10) were computed. 

The proof of the theorem is similar to that of Theorem 1 in Section 5. We 
make the substitutions 


via “ 


k-l 

Hi = nCi, TO*, “ n, - 23 


Vi 


23 mij , TO** « 2n* + 23 

TO,/ = nc,c/ + Vnxij, 


in (6.9) and employ Stirling’s formula exactly as before. The details are too 
similiar to warrant repetition. The final result is 

(6.14) d(to//) = d*// ( 1 + 0 . 

Where 1| || is the inverse of (6.11) and is defined by 



0..7 =1.1 


1 


= -, H 1- 


+ 


C/C, 


_0'.*P _ ^ 4. ^ — — I- ^ 

o 1- — , c h -j. 

eie* 6* CiC* e* 

Theorem 1 is a corollary of Theorem 2. Also we may state these additional 
results: 

Corollary 1. If k {> 3>) kinds of elements are arranged at random and r 
denotes the total number of runs of aU kinds of elements, then the variable 

^ _ r - n(l - 2e\) 
y/n 

%% asymptotically normally distributed with zero mean and variance 

ff = 2e^ - 2ScJ + (Sc?)* 

where C/ is the ‘proportion of elements of the i-th kind. 

Corollary 2. The variable Q *= S<r”xiZ/ where the *,• are defined by (6.2) 
and II O’'' II is the inverse of (6.3), is asymptotically distributed according to tiie 
X-law with k degrees of freedom. 

As was mentioned in Action 4, we could define variables «</ (t = 1, 2, • • • , fc 
and 3 = \,2, • • • ,hi , the A< being a set of k arbitrary integers) with a distribu> 
tion similiar to (2.14). If one worked through the details he would find, no 
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doubt, that these variables are asymptotically normal. The matrix of vari- 
ances and covariances is so complicated, however, that such a theorem would 
hardly be useful to the statistician, and the author does not feel that it would 
be worthwhile to go through the long and tedious details merely for the sake of 
completeness. 


Part II 

Instead of having the number of elements of each kind fixed, we now suppose 
that they are randomly drawn from a binomial or multinomial population. The 
numbers rn thus become random variables subject only to the restriction that 
Sn* == n, the sample number. The development will be entirely analogous to 
that of Part I, and the same notation will be used. The probability associated 
with the ith kind of element will be denoted by p* . 

7. Distributions and moments. The major part of the derivation of the 
various distribution functions has already been done in Sections 2 and 3. With 
the distributions of these sections we need only employ the fundamental 
relation 

(7.1) P{X, Y) = Pr{X I F)P,(r) 

in order to obtain the distributions required here. X will represent the set of 
variables r<y or ry , and Y the variables n< . For the binomial population 
Pt(Y) will be 

(7.2) P(n„n,) = (”)pr‘pr. 

Therefore we may write down at once the distributions 

(7.3) P(r.„ n.) - ['J 'M'P!', 

(7.4) P(n„ n,) - 

(7.5) P(r., «,) - : J) ("* + 'y' K*. 

(7.0) PC.,, n.) - [,-](- - " '>•■* - ')(»• + 

(fh- B- {h- l)sto - l\ 

\ )nsi,s,)p,P2, 

1 , ... 1 , 
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corresponding to the distributions (2.6), (2.9), (2.11), (2.13) and (2.14) respec- 
tively. Of course there is some dependence among the arguments. In (7.4), 
for example, Ui is determined by 2iru « Wi , and n 2 by w — ni = n 2 . In the 
last three distributions one of the n» is independent and one may sum these 
with respect to rii from zero to n and obtain the distributions of the r’s alone. 
The results of such summations are quite cumbersome and in some cases can 
only be indicated, so we shall retain the as relevant variables. This remark 
applies also to the multinomial distribution. 

We shall obtain expressions for the joint moments of the variables in these 
distributions. It is clear that the moments in Section 3 will be of considerable 
aid; for, using the notation of (7.1), we have 

(7.8) Eif(X)g(Y)) = ZmgiWX, K) = Z 

xy r X 

and the sum in the bracket on the right has been computed in Section 3. It re- 
mains only for us to multiply the previous moments by g(Y)Pt(Y) and sum on 
Y. Corresponding to (3.4), (3.12), (3.9) and (3.19) we have 

(7.9) n r["A = t «i“’(n* + I)'*'*’ (” “ ^^7 ^ApVp2*, 

\ I / n,-0 \ m — SiOi / 

(7.10) tn!"(«. + l)''“’(" 

\ I / n,-o \ nx — ZtOi / 

(7.11) E(nl*>rl‘') - t »!•>(«, + 1)>“(" “ 

wi-0 \Wl — 0 / 

Z h - + a* - 1\ 

\ ^ — 1 / 

\ 88 — 2 6 / — 1 / 

for moments from (7.4), (7.6), (7.5) and (7.7) respectively. In order to perform 
the summations indicated in these last relations it is necessary to expand the 
factors multiplying the binomial coefiBcient in factorial powers of its lower 
index. That is, we must write 

(7.13) ni“*(na 1)“” = Z ®> W(»i — 

<>0 

Again it is not possible to give a simple expression for the coeflScients Ciiti, a, h) 
in general, but for the first few moments they present no difficulty. For example 
from (7.9) 


n ri = 

(7.12) 
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EiniTid = »i(» ~ “t* „ * ,• 

m-o \ n\ — t 


( 7 , 14 ) / _ • \ 

“ 2 |^*(n ~ y „j !_ j ^) + (” ■“ 2*)(« - * - 1) 

= [i(n — t + 1) + (n — 2t) (n — t — l)pi — (n — t — 1)® pilpipi. 

We give below some means, variances and covariances which will be required 
later. 


= 23 [♦(♦» — t + 1) + (n — 2t)(ni — t) + (hi — t)®] 




Eini) = pipj[(n - t - l)p, + 2], 

E{svk) - pt[(n - k)-pi + 1], 

- i - j)”V 2 + (n - i - j)p*(l + 6pi) + 6pJ 

- ((n - i - l)p, + 2)[(n - j - l)pj + 2]}, 
fl’ruru = Pi*Pj{(n - 2i)®?^ + (n - 2i)pt(l + 5pi) + 6p! 

- [(» - i - l)pi + 2]*} + p{p*[(n - t - 1 )pj + 2], 

(7.15) (Tr.jrjy = p{pi{{n - i - j - 2)‘®piP* + 4(n - i - j - l)pip, + 2 

- I(n - i - l)pt + 2][(n - j - l)pi + 2], 
<^*u>u * Pi‘*'*p*{(« - t - * + 1)® - 2(n - i - A:)®P 1 

+ (n - i - k - l)®pi - [(« - i - 1)Pj + 2)[(n - k)p, + 1]}, 
"■•u*!* ® Pi*{(« - 2k + 1)®^ - 2(n - 2fc)‘“pi + (« - 2A:)”’p? 

- [(n - k)pi + 1]*} + pt[(n - k)pt + 1], 
= PiP*{(w - k - 3 - 2)”ViP* + 2(n - A: - J - l)pi(l + pi) 
+ 2(1 + pi) - pi[(n - A:)pj + l][(n - j - l)pi + 2]}. 

In order to obtain the distribution of runs in samples from a multinomial 
population, we multiply the distributions of Section 4 by 

(7.16) Pim) = [”] n P?'- 
Corresponding to (4.1) and (4.2) then, we have 

(7.17) P(r,„ n.) = fl hlp(r<) II pV 

<-i U<, J 1 

p{tu n<) = n z J) ^(n) n Pi ’• 


( 7 . 18 ) 
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In (7.17) Uj is the number of runs of length j of elements with probability 
In (7.18) Ti is the total number of runs of elements with probability Pi . As 
before, we shall investigate in detail only the distribution (7.18). The moments 
of n< and r,- follow at once from (7.8) and (4.5) 

(7.19) s(n (»!••’.!“’)) = E n («!■"(«. - 1)"") [V-t'] 9 

where u< = n.- — u . The means, variances and covariances of the u are 
E{ri) = np<(l - Pi) + Pi , 

(7.20) (T,,,, « -npip,(l - 2pi - 2p, + 3pip,) - piP#(2pi + 2py - 5piP/) , 
ffrjfi = np<(l - 4pi + 6pi - 3pi) + Pi(3 - 8pi + 5pi). 


8. Asymptotic distributions from binomial population. We turn our atten- 
tion first to the distribution (7.7) and state a theorem analogous to Theorem 2 of 
Section 5. , 

Theorem 1. The variables 


Ui = Xi 


8u — nplpl 
■s/n 


t — 1, • • • , Ai — 1, 


(8.1) 


Ut * Xt 


Uk+i = Vi 


su - nptpi 
y/h 

sk - np\p\ 

y/h ' 


t = 1, • • • , A — 1, 


M*4* = Z 


Til — npi 
y/h 


are asymptoticaUy normally dislribuled with zero means and variances and covari- 
ances 

VziXi - p\p\ - {2i + l)prp» + 2pJ'^'j4 , 

O'*,*,- = —{i+j + 1)Pi^^Pj + 2pi*'^^ji , 

O'**** = —{i + k-\- l)pi^*;^ ■+■ Px^*^*p* , 

= PiPi - (2* + l)Pi*Pi * 

= -(* + J + l)p}pj‘^^ -I- 2plpi/^\ 

<r,*v* = p?Pj - (2i + l)pipi’ + 2p!?^‘'^*, 

= -(» + i + ^)pl^Pi*^ + 2pi*^pi*\ 

= -0o+j + 2)pr*p*'^' + pr‘3^(l + Pj), 
ff.i» = *PiP» + Pi‘^‘pj( 1 - 4p»), 

<r.*, = (A -f l)pipl - pf(l + p»), 
o^»i. = »P*P» + PiPj'^^I - 4px), 


Vu = PlP» . 
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We have taken Sih and nj to be the dependent variables of (7.7). The method of 
proof of this theorem is the same as that of Theorem 1 in Section 5, and will be 
omitted. As consequences of the theorem we have 
Corollary 1. The variable 

Q ■■ S a^^UiUi 
1 

is asymptotically distributed according to the with k + h degrees of freedom. 

Corollary 2. Any subset Ui ^ , w,, , • • • , Ui^ of the variables (8.1) is asymptoti- 
cally normally distributed with zero means and variances and covariances || || , 

and 


is a^mptotically distributed according to the x^“lo>w with m degrees of freedom. 
I 11 is the inverse of H H . 

Corollary 3. If Si = Si* + S 2 t represents the total number of runs of length i of 
both kinds of elements, and Sk the number of runs of length greater than A: — 1, then 
the variables 


(8.3) 


_ - niplpt + pipl) . . j ___ - 

y/n 

«* - n(ptp, 4- PiPi) 

•C* y— I 

y/n 


are asymptoticaUy normally distributed with zero means and variances and co- 
variances 


(8.4) fff; — ”1" ”1" 

where the terms on the right of (8.4) are defined by (8.2). We have put h ^ k 
in Theorem 1 to obtain this result. 

CoHOLLABY 4. The variable 

k 

(8.5) Q = 

1 

where the Xi are defined by (8.3) and || || is the inverse of (8.4), is asymptotically 

distributed according to the x-law with k degrees of freedom. 

CoBOLLARY 5. Jf r denotes the total number of runs of both kinds of elements, 
then 


( 8 . 6 ) 


r — 2npip% 
2VnpiP»(l - 3pipa) 


is asymptotically normally distributed with zero mean and unit variance. This is 
the result obtained by Wishart and Hirshfeld [11]. 
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9. As^ptotic distributions from die multinomial population. In this 
section we assume A; > 2 to avoid degenerate distributions. Because of the 
function F(r,) in (7.18) we do not investigate this distribution directly, but 
derive a more general asymptotic distribution as was done in Section 6. We 
consider the distribution 

(9.1) 

corresponding to (6.9). This is derived from (7.19) in the same manner as 
(6.9) was from (4.5). As before, we have replaced the numbers — 1 in (7.19) 
by n< , an unessential change as far as the as 3 anptotic theory is concerned. 
We recall that 

(9.2) 

hence we need only show that the variables on the right are asymptotically 
normally distributed in order to have the same result for the . Corresponding 
to Theorem 2 of Section 6, we state 
Theorem 1. The variables 


Xii - 


(9.3) 


mg - npiPi 

Vn 


n» — npi 
y/n 


i, i == 1, . . . , A: - 1, 


t = 1, . • * , fc — 1 


are asymptotically normally distributed with zero means and variances and co- 
variances 


(9.4) 


~ ^piP iP»pt , 
~ ""3p*p«P( , 

<r*t,yy = '^^PiPi , 

~ ^PiPt , 


(Tii.i — 2p<(l POj 

= P<(1 - P<). 

In these relations the symbols are defined by 


= -^piPiPt , 

= p?p,(l - 3p<), 

Oii.ii = P?(l + 2p< — 3pi), 
= -2p<p,-p, , 

= P<P>(1 - 2p<), 
v»,j = ~P»Pj I 


and different literal subscripts represent different numerical subscripts. These 
moments have been computed by means of the identity (6.12). The proof of 
the theorem is like that of Theorem 2 of Section 6 and will be omitted. We can 
now give the limiting form of the distribution of the r,- in (7.18) as 
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Corollary 1. The variables 


(9.6) 


Xi 


r< — np<(l — Pi) 
■y/n 


* = 1, 2, . . . , fc 


are asymptotically normally distributed with zero means and variances and co~ 
variances 


(9.6) 


» p<(l - p<) - 3p!(l - pif, 

Cij = -p<py(l - 2pi - 2p, + 3p<P;). 


These limiting moments follow at once from equations (7.20). 

Corollary 2. The variable 

Q = S a''XiXj 

where the x< are defined by (9.5) and || 1| is the inverse of (9.6), is asymptotically 

distributed according to the x-taw with k degrees of freedom. 

Corollary 3. If r = Sr* denotes the total number of runs, then 

r - n(l - 2pJ) 

is asymptotically normally distrOmted with zero mean and variance 

(T* = 2p? + 22p? - 3(2pJ)*. 


The author would like to record here his gratitude to Professor S. S. Wilks 
who suggested the problem and under whose direction this paper was written. 
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A GENERALIZATION OF THE LAW OF LARGE NUMBERS 
Bt Hilda Gzubinobb 

It is well known that the law of large numbers can be established for dependent 
as well as for independent chance variables by using Tchebycheff’s inequality [1] 
and assuming that the variance of the sum of the variables tends towaids 
infinity less rapidly than n*. 

In recent years v. Mises has introduced the notion of sUUiatical funcHona [2] 
and has shown that, under certain assumptions the law of large numbers is still 
valid if, instead of the arithmetic mean of the n observations Xi , • • • , a 
statistical function of these observations is considered. For example in the very 
special case, where the n collectives which have been observed are identical 
A-valued arithmetic distributions with probabilities Pi , • • • , Pt corresponding 
to the attributes Ci , • • • , c* and with observed relative frequencies ni/n, • • • , 
ni/n one obtains the result : It is to be expected for every « > 0 with a probability 
P, converging towards one as n — > « , that | /(ni/n, • • • , nt/n) — /(pi , • • • , p* )1 
< c under very general conditions concerning the fimction/. 

In the present paper we shall generalize these new results so that they will 
apply also to collectives which are not independent. 

1. Lemma concerning altematlYes. Let us consider the n~dimenaional 
collective consisting of a aequerux of n triala and let us assume that the n trials are 
alternatives, i.e. for each trial there are only two possible results which we 
denote by “success,” “failure,” by “occurrence,” “non-occurrence” or by 
“1,” “0.” The total result of the n trials is expressed by n numbers each equal 
to 0 or 1. Let v(xi, xt, • • • , x„) be the probability of obtaining the result xi 
at the first trial, xj at the second one, • • • , Xn at the last one (x, = 0,l;i' — 
1, • • • , n). In the same way we introduce vu(x, p) = £ v(x, y, Xs , • • • , x«) 

*l»* * *»*f» 

and generally ^(x, p) as the probability that the Mh result equals x, the i^h 
equals p, (m v), and finally let «>„(x) = 23 v^{x, y) be the probability that the 

V 

/ith result equals x. In particular let us write 

«„(!) = p„ , v(l. 1) “ P<.r , (m, I- = 1, • • • , «; M ?* >») 

p„ being the probability of success in the ^th trial and p,» the probability of 
simultaneous success both in the /ith and vth trials. 

The variance of the sum (xi + • • • H- x») is easily found: 

303 
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ai = Var(xi+ ... +a:n) = £ (*1 +•••+*« — Pi — Pn)*»(*i , • • . , ®«) 

= £ (*1 - Pi)*«(*i ,•••,**) + ••• 

» 1 »* • *•*» 

+ 2 (*1 -pi)(a!* -Pl)»(*l, + .*• 

= 53 ~ Pi)**’i(*i) + • • • +2 23 (®i - Pi)(®t - P»)«'ij(®i , »j) 4- . . . 

*1 *i.a»t 

= Pl(l — Pl) + . . . + pii(l — Pn) + 2(pi» — PlP*) + . • . + 2(pn-\,n — Pn-lPn)- 

Thm: 

n n 

(1) al = Var (xi + . . . + a:„) = 53 P.(l - p.) + 2 53 (Pi» - PmP»)- 

r-1 

The first sum on the right is ^nl4; the second one consists of N Jn(n — 1) 
terms, therefore we cannot be sure that it tends toward zero after division by n*. 
Putting Pf,p — we see immediately: 

(a) A necessary and sufficient condition for lim Sn/n = 0 is 

n-^oo 

(2) lim 1/n* E a‘r> = 0. 

n-^oo M'*'**! 

Denoting by <rl the variance of v^{x) and by the correlation coeflScient of 
y) we have 

OLf^p ** Pm** ~ PmP»' ~ rn^nCFp . 

We see that takes values between —1/4 and +1/4 and our conditions (2) 
postulates that the sum of these positive and negative terms tends towards 
infinity less rapidly than As to the meaning of the signs of these terms we 

see that a term will be ~ 0, according as p^^/pp ^ . This means: the 

fact that the i/th event has presented itself makes the occurrence of the pth 
event either more probable; or it is without influence on it; or it makes it less 
probable. And we see that Sn/n tends toward zero, only if there is a certain 
^‘equalization^’ or “stabilization” of positive and negative mutual influence. 
If in particular for a pair of values p, y, r^^p = +1, that is v^r(0, 1) = Vnp{lj 0) = 0, 
the events must either both occur or both fail and Ptt — Pp • If = — 1 we 
have 0) == v^,(l, 1) = 0 the simultaneous occurrence is impossible and 
likewise the simultaneous failure, and p^ + Pp = 1. If we have p^p = 0 (case of 
mutually exclusive events) then p^ + Pr ^ 1. 

n n 

Since si ^ 0 and 23 ~ P») = 23 ^ we conclude from (1) that 

I— 1 1—1 

n 

53 “1?^ = — n/8 and we obtain the following simple sufficient condition for the 
1 

validity of (2) : 
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(b) Let w denote bymnthe number of aU combinations Ui r(ti ^ n; v ^ n; ft r), 
suck that, however large n may be, > e, where e is a given positive number', 
1 " 

Iben — 2 0 ^?’ converges toward zero if lim »n»/n* == 0. 

We have in fact 


« < V <«) 

-o ^ 2- «, 


8 “ 


M*' 


g m» + (AT - m»)€ 


1 1 

and dividing by n* we find that— 2 is enclosed between — and TOn/n* + 

oW 


N -m. 






which both tend toward zero. Roughly speaking this condition implies 

that for “almost alF^ combinations of indices Vt the converge toward 
“negative or vanishing correlation.’’ 


On the other hand the sum of all positive and negative terms in £ 

cannot become less than — w/8. Therefore, if “almost all” positive terms are 
supposed to tend towards zero it follows that also almost all negative terms 
tend toward zero. Thus we obtain the sufficient condition (c) which is neither 
more nor less general than (6) : 


1 

(c) The sum — 2 tends tormrds zero as ^^almost alV* the. indi- 

vidual terms — p^pp tend toward zero. Or more exactly, the sum in 

question tends toward zero if | a^p^ | g e for every e and suflSciently large n with 
the exception of Mn terms where lim Mn/w^ == 0. That is “convergence towards 

n-*oo 

independence” for almost all combinations u, v of indices. Ijet us, for example, 
assume that all the p, are 0 and all the p„, = 0, then all the are certainly 
< 0 and (b) is fulfilled; but it is easily seen (3) that in this case pi + ps + • • ■ 
p» ^ 1- Therefore all the products p„p, (with the possible exception of a finite 
number) tend toward zero, and (c) holds as well. 


2. Statistical functions. Suppose n observations have given the results 
xi , Xi , • • ■ , Xn . Let us assume for the sake of simplicity that they are all 
bounded between two real numbers A and B. To each real x corresponds the 
number n Sn(x) of observations with a result ^ x. S.ix) is a monotone non- 
decreasing step function with n steps, each of height 1/n; however several steps 
may coincide at the same point. We have 

(1) <S.(a:) = 0 if x < A and Sn(x) = 1 if x B. 

Snix) is called by v. Mises the partition (Aufteilung) of the n observations. 
/S»(a:) coincides with the well known cumulative frequency distribution if the 
attributes c, (k = 1, • • • A:) and the corresponding relative frequencies ni/n, • • • 
n»/n are given. 
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A statistical function is a function of the Xi , xt, ••• ,Xn which depends only on 
Sn{x)y the partition of the n results. It will be denoted hy f{Sn(x)]. If the c« 
and the n,/n are given then statistical function means simply ^'function of the 
relative frequencies'' and it becomes a function of k variables. In f{Sn{x)] the 
partition Sn{x) takes the place of the independent variable. Such a statistical 
function has the following properties: (a) It is a symmetric function of the 
Xi , Xi , • • • y Xn . That is, it is independent of the succession of the n results* 
(b) It is ^^homogeneous" in the following sense: If instead of n observations 
we have nl observations and if at the same time each Xp is replaced by lx, then 
the statistical function is not changed.^ Examples of statistical functions are 
the moments 


- £ = f x^'dSnix) = Mr 

n r-l J 

or, if MJ =a a, the moments about the mean a: 

(xp — aY — f (x — aYdSnix) = Mr, etc. 
n J 

The independent variable in /{5n(x)} is a partition; but in addition we shall 
define /{P(x)} where P{x) is a certain bounded distribution which is not neces- 
sarily a partition. A distribution P{x) is called bounded if 

(!') P{x) = 0 if X < A and P{x) = 1 if x ^ B, 

If this is true for a sequence Pi(x), P 2 {x)y • • • with the same A and B then the 
sequence is called uniformly hounded. Let us now consider a bounded partition 
P{x) which in every point of continuity of P{x) is the limit as n —>oo of a se- 
quence of bounded partitions /SnCx). As Sn{x) converges toward P{x)y if 
f{Sn{x)] converges towards a limit L which does not depend on the limiting 
process /Sn(a:) — » P{x) then that limit shall be denoted by /{P(x)}; it will be 
called the value of the statistical function at the ^^point*^ P{x) and f{Sn{x)] will be 
called continuous at P{x). The definition of continuity can be given also in the 
following way: Corresponding to every € > 0 exists an ly > 0 such that 

(2) l/{W) -f{P(.x)]\<, 

for all values of n and for every bounded Sn(x) such that at every point of 
continuity of P(x) 

(3) 1 Snix) - Pix) 1 ;g 

In this case /{ Sn(x) ) is called continuous at the point P(x). Thus a statistical 
function is defined for bounded partitions and for certain bounded distributions 
which are not themselves partitions. If the continuity defined by (2) and (3) 
exists for a sequence Pi{x), Pt(x), • • • of bounded distributions with the same ij 


‘ This condition of homogeneity is fulfilled e.g. for y/xiXt ••• x, but not for XiXt •••*.. 
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corresponding to a given «, we call the statistical function uniformly continuous 
at the points Piix), Pt(x), • • • . 

S. The general law of large numbers. The generalization of the law of large 
numbers which we have in mind can be demonstrated in a way analogous to the 
demonstration given by v. Mises in the case of independent collectives if we 
introduce the results of paragraph 1 in order to estimate the variance. We shall 
consider here only one dimensional, bounded collectives in order to make clearer 
what is the essential of the generalization. 

A sequence of dependent collectives Pi(x), F*(x), • • • , Pn(x) can be given in 
the following manner. Let P{xi ,xt, • • • , ain) be the probability that the result 
of the first observation is g a;i , of the second S , • • • , of the nth ^ , 

This distribution will be said to be hounded in (A, B) if P = 1 when all the x, 
are ^ B and P = 0 if at least one of these arguments is less than A. From this 
n-dimensional distribution we deduce n one dimensional distributions 

Piix) = P(x, B, ■■■ ,B), 

Piix) = P(B, x,B,... ,B), , P„ix) = P(B, ..,B,x) 

where Pp(x) is the probability that the vth observation be g x. The Pp{x) are 
uniformly bounded in (A, B) which is a consequence of P(xi, x^f • • • , a^n) having 
been assumed to be bounded in this interval. In an analogous way we deduce 
from P(xi , , • • • , a?«) the Jn(n — 1) uniformly bounded two dimensional 

distributions 

(2) Pu(x, y) = P(x, y,B, . .. B), Pnix, y) = P(x, B,y, B, . . . B), . . . . 

Here Pi„ix, y) is the probability that the juth result is ^x, the Hh result gy, 
and we have P^ix, y) — P,^(y, x). Of course we have also 

(10 Pi(x) = Pi,(x, B) = P„(x, B) = . . . = Pi,(x, B) 

Pj(x) = Pu(B, x) = Pnix, B) = • • • = Panix, B) etc. 

If we put in (2) X = y we obtain P„,(x, x) = P„iix, x) and we introduce 

(3) P„,(x, x) = P»,.(x) = P^x) 

the probability that both the ^th and the I'th observation is ^x. Then P .r(x) 
equals zero if x < A and equals one if x ^ B, and this is valid with the same A 
and B for all the distributions P,„ix). 

Now if Pi , Pi , • • • , p» are the probabilities of success for n general alterna- 
tives TchebychefF's Lemma asserts that the probability W that the average 
(*! + *!+ • • • + Xn)/n of n observations differs by more than ij from its expecta- 
tion (pi + Pi + • • • + P«)/n is subject to the following inequality 


Here is given by (1) of paragraph 1. 


(4) 


W g -iVar 
r 


(- 


+ Xs + * * * " h 
n 


^ = A- 
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Let US introduce the average Pn{x) of the P,{x) : 

(5) Pn{x) = [Pi(x) + P,{x) + • • • + p,(*)]/n 

and let Qn be the probability that at any point of continuity of Pn(,x) the in- 
equality 

(6) 1 Sn(x) - P.ix) I > , 

holds. Our aim will be to show that for every ri under certain restrictions re- 
garding the given collectives, Q* tends toward zero as n tends toward infinity. 

For a fixed point x' the probabilities P^ix) = p, and Pii,(x) = p^, are constants 
and we put Pnix) = p« = (pi + pt + ■ ■ ■ Pn)/n. The probability that in x' 

(7) 1 -S„(a:') - I > ,/2 

is then, according to (4) smaller than (s1)x>/(Jj;)V. Here we denote by («*),» 
the value of s* in x' (as given by (1) in paragraph 1). 

Now we divide the interval (A, B) in N parts in such a way that in every one 
of the N intervals e.g. in (x', x”) the variation 

(8) fi = Pnix") - Pnix') s ij/2. 

If there is at x' (or at x") a step of Pnix) we take the limit which Pnix) approaches 
as a: — » a:' (or x") from the interior of the interval. In order to obtain such a 
division we need only divide the total variation 1 of Pnix) in 2/ij equal parts and 
project these points of division on Pnix), disposing however in a suitable way of 
horizontal parts of Pnix). The abscissae of these points form the endpoints 
of the N intervals. If there is a step of P„(x) at an endpoint of one of these 
intervals the variation in both the adjacent intervals can only be diminished. 
It is further possible that the two ends of an interval coincide x' — x", this will 
be so if Pnix) has for x' a step > it/2. In any case we have a division in JV g 2/17 
intervals such that all the points of continuity of Pnix) are enclosed in them and 
in each of these intervals (8) is valid. 

Let us now assume that in the left end point x' of the rth interval (x', x") the 
inequality 

(9) I -s„(x') - Pnix') 1 g V2 

is valid. Then we have for every x between a;' and x" 

(10) I Snix) - Pnix) 1 ^ 17/2 + « g 

Because, since Snix) and Pnix) are both monotone, the difference Snix') — 
Pnix') cannot increase by more than 5 ^ 17/2 as x varies from x' to x". There- 
fore if (6) is valid for any point x in this interval then (7) must be valid for 
the left end point x' of this interval and the probability g, of this latter inequality 
is less than or equal to 4(s*),'/)7*n. 

But there are N intervals with the left endpoints x[ , xj , • • • x^ and the 
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probability that (6) may be valid in any point belonging to any one of these 
intervals is ^ qi + qt + ■•■ + qn ■ Denoting by «* the greatest of the N 
variances («*),{ , , • • • , (8l)*Ar we have for (which is the probability that 

(6) may be valid at any point of continuity of P{x)) the inequality 

(11) Qn ^ ?i + ?2 + • • • + 9# ^ ^ 1 

ri^rr ij* n* 

Therefore tends toward zero for every i; if ««/n tends toward zero. 

But according to (2) in paragraph 1, ««/n tends toward zero if for every x in 
iA,B) 

(12) lim i Z [P,,(x) - P,ix)P,(x)] = 0. 

Considering the definition of continuity of a statistical function we have ob- 
tained the following result: 

As in (!'), (2), (3) and (5) let y) he two dimensional distributions (/*, v * 
1, • • • , n; /X v), uniformly bounded in (A, B); PAx, B) = P^(x); PAx, x) «= 
PAx) and P,(z) = l/i>(Pi(x) + Pj(x) + • • • + P,{x))- 
If the variable partition Snix) is bounded in {A, B) and if f[Sn{x)] is uni-^ 
formly continuous at the ^^points^* ^i(x)f Piix)^ • • • then the probability that 

(13) |/I-S»(X)} -/{Pn(*)} I > * 

tends toward zero for every ^asn-^ oo, provided (12) is uniformly valid for every 
x in (x4, B), 


4. Examples. Let us illustrate by simple examples. 

1) In order to define the Pp{x) etc. mentioned in our theorem we define the 
n-dimensional distribution P(xi , a:* , • — Xn) used at the beginning of paragraph 
3 by indicating the probability density 

p{xi , X* , * • • , Xn) == Cn[l — xixj • • • xj in the “unit cube’’, 

= 0 elsewhere. 

The corresponding probability distribution is 

/•*! 

(2) P(®l,a!j, •..,Xn) = / •’■I Axi,Xi, ,Xn)dXi dXn. 


By putting 



we see that P(xi , xt, • • ■ , Xn) equals unity if all the arguments are ^ 1 and it 
equals zero if one of these arguments is less than 0. Therefore P(xi , xi , * * • , 
Xn) is bounded in the unit cube. 
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(4) 


From (1) we deduce the two-dimensional densities 

v(»» y) *= in the unit square, 

■■ 0 elsewhere 

and the distributions 

(6) •Pii,(x, y) — LL v)dxdy. 


We see that 


y) 


= Cnxy 




» 0 
= 1 



in the unit square 

if X or 2 / ^ 0 
if X and y ^ 1 


and e.g. for x ^ 1, 0 < y < 1 we have P^,{x, y) = y) etc. Thus the 
y) are completely given. 

It follows from (3) that — C«/2" = 1 — C. ; therefore putting C« = C we 
have in (0, 1) 

P,it{x, x) = •Pm»(®) “ C"®* + (1 — C)x* 

( 6 ) 

P,(x) * Cx + (1 - Ox* 

therefore 

(7) P^x) - P,(x)P,(x) = C(1 - C)x*(l - x)* 

is < 0 for every x in (0, 1) since C > 1. For x ^ 0, P,t,{x) and P,(x) both 
equal zero and for x ^ 1 they both equal 1.. Therefore our conditions of para- 
graph 1 are fulfilled. We see that C« tends towards unity as n ^ therefore 
for every x in (0, 1) Pm>(x) — P„(x)P,(x) tends towards zero, we have “conver- 
gence towards independence” but by no means independence. 

This example was based on a symmetric density. Let us give an example of 
asymmetric and arUkmetic distributions. For the sake of simplicity let Pi(x), 
Pj(x), • • • be arithmetic distributions each with only three steps at x = 0, 1 
and 2. As starting point we take the rnlimensional arithmetic distribution 
v{xi, Xt, • • • Xn) which gives the probability that the first result equals xi, the 
second x* , • • • , the nth x. , the x, being equal to 0 or 1 or 2; thus v(xi , Xt , • • • , 
Xn) takes 3” values the sum of which equals unity. We deduce the two dimen- 
sional distributions Pn»(x, y), e.g. t;ij(x, y) — v(x, y,xt, • • • , x*), the prob- 

»!.• • •.», 

ability that the first result equals x, the second y, and finally the vi(x) — 
*>n(x, y), etc. According to the definitions of P,(x) and Pn<.(a:) we have then: 
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(8) P,{x) - 0 (x < 0) 

- »,(0) (0 5 X < 1) 

* i;,(0) + »►(!) (1 S X < 2) 

- 1 (2 S X), 

(9) P^(x) - 0 (x < 0) 

- t;^(0, 0) (0 g X < 1) 

= v(00) + v(10) + t;^,(01) + v(ll) (1 S * < 2) 

* 1 (2 S x). 


Now we subject t>(xi, ■ • • , x«) to the following conditions: Every v(xi, • • • , x,) 
equals zero if it contains either: at least two “zeros,” or: at least one “zero” 
and one “one,” or: at least two “ones.” All the other a-values are supposed 
to be different from zero. Then we have 

i;„(0, 0) = v(l> 0) = V(0, 1) = V(l. 1) =* 0 

therefore *= 0 for x < 2 and P»,(x) = 1 for x ^ 2. On the other hand 
v,(0) — v(2, 2, • • • 2, 0, 2, • • • 2) and = »(2, 2, • • • 2, 1, 2, • • • 2) there- 
fore P,(x) ^ 0 for 0 ^ X < 2 and we have thus for every finite n 

P,.»(x) — P„(x)P,(x) = 0 for X < 0 and x ^ 2, 

<0 for 0 ^ X < 2. 

Therefore the condition (b) of paragraph 1 is fulfilled and thus (12) paragraph 3 
holds. 

I hope to have the opportunity to discuss more general applications of this 
theorem later. 

A generalization of the strong law of large numbers may be given in a simi- 
lar way. 
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CONDITIONS FOR UNIQUENESS IN THE PROBLEM OF MOMENTS 

Bt M. G. Kendall 


It was shown by Stieltjes [1] that in some circumstances it is possible for two 
different frequency distributions to have the same set of moments. For in* 
stance, the integral 


around a contour consisting of the positive x-axis, the infinite quadrant and 
the positive ^-axis is seen to be zero and it follows that 

sin x^dx = 0. 

Thris the frequency distribution 

I . 0 < X < », 

(1) dF — ic **(1 — X sin x^) dx 

0 < X < 1 


has moments which are independent of X, and equation (1) may be regarded as 
defining a whole family of distributions each of which has the same moments. 
It is easy to see that moments of all orders exist, and in fact 

fil (about the origin) = i(4r + 3)!. 


A second example of the same kind, also due to Stieltjes, is the distribution 

0 < X < 00, 

0 ^ X < 1, 

for which 


( 2 ) 


dF = -r X ‘“**{1 — X sin (2ir logx)} dx 
eWr 




The question naturally arises, what are the conditions under which a given 
set of moments determines a frequency distribution uniquely? The question 
is of great interest to mathematicians, being closely linked with problems in the 
theory of as 3 mcLptotic series, continued fractions and quasi-analytic functions; 
and it also has importance for statisticians since there is sometimes occasion to 
be satisfied that a problem of finding a frequency distribution has been uniquely 
solved by the ascertainment of its moments or semi-invariants. Stieltjes him- 
self considered a more general problem: given a set of constants cb , 
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Cl, • • • Cr , • • • does there exist a function F, non-^lecreasing and possessing an 
infinite number of points of increase, such that 

(3) I* x^dF = 

and under what conditions is F unique, except for an additive constant? 
Stieltjes showed that if we express the series 


(4) 


«e 

L 


r— 0 


i-iy 


Cf 


as a continued fraction of the form 


(5) 


1111 1 1 
diZ + Oj + Os* + 04 + Os„_l 2 + Oj* + 


it is a necessary and sufficient condition for the existence of at least one F that 

all the o’s be positive; and that the function is unique or not according as the 
00 

series £ (a,) diverges or converges. (If the a’s are positive it must do one or 

r—0 

the other.) The integral of equation (3) is to be interpreted in the general 
Stieltjes sense, so that the result applies to discontinuous as well as to continuous 
distributions. This is also true of the results obtained below. 

Hamburger [2] discussed the similar problem when the limits of the integral 
in equation (3) are ± « , and showed that a function F exists if the expression 
of (4) as a continued fraction of the form 


bo bt bt 

a« + *+ai + *+ o* + * + 

gives positive values of the 6’s. In order that F may be unique it is necessary 
and sufficient that the continued fraction be completely (vollstSiidig) convergent 
in the sense defined by Hamburger. 

Unfortunately these criteria, though mathematically complete, are not very 
useful to statisticians because as a rule it is too difficult to express the coefficients 
a and b explicitly enough in terms of the given c’s to enable questions of sign or 
of convergence to be decided. So far as I know, no more convenient criterion 
for the general Stieltjes problem has been found; but progress is possible if one 
considers the narrower question: given a set of moments, is the distribution 
which furnished them unique, that is to say, can any other distribution have 
furnished them? This is more limited than the Stieltjes problem because we 
know that at least one solution exists. 

Contributions to this subject have been made by L4vy [3] and Carleman [4]. 
L4vy shows that if moments of all orders exist and are positive it is a sufficient 
condition for them to determine a distribution uniquely that /»»"/« remains 
finite as n tends to infinity. (Here and elsewhere in this paper fir refers to the 
moment of order r about any point, not necessarily the mean.) Carleman shows 
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that, for the case of limits — » to + « the moments determine the distribution 
uniquely if 

•0 j 

S 

diverges. For the limits 0 to « he gives the corresponding series 

OO j 

a criterion which can be improved upon, as will be shown below. 

The purpose of this paper is to develop criteria of this kind more systematically 
and to give more general criteria suitable in cases where the moments are not 
known explicitly but the behavior of the frequency distribution at its terminals 
is known. 

Three preliminary points necessary for the later argument may be noted. 

(1) Define the absolute moment of order r by 

and recall that 

•"i ^ >'3 ^ ^ • 

(cf. Hardy and others, [5]). In other words the quantities v\'^ form an increas- 
ing positive sequence and their reciprocals a decreasing positive sequence. 

(2) The quantity v‘n” /n must either tend to a limit or diverge to infinity as 
n — » 00 . For suppose that 

lim Vn K 
lim 

Writing temporarily " «= o* , we have that, given c there is an JV such that 

On/n > k — € 

for an infinity of values of n greater than N. Similarly there is an Af such that 

o»/n < f -f* € 

for an infinity of values of n greater than M. Now choose p such that a, , a^+i 
are two consecutive values, one near the upper limit and one near the lower 
limit. This can always be done and we can take p as large as we please. We 
then have 


a„ > p{k — e) 

Op+l < (p + l)(f + «) 
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and hence, since a^i > Op 


(A: - e)p < (p + 1)(J + €> 


giving 


(A: — Z)<- + 2«-f'-. 

P P 

Thus k — I can be made as small as we please and is thus zero. 

The argument can be very simply adapted to the case in which A; is infinite, 
and if I is not finite k, being not less than I, is infinite. Thus as n — » « either 
lim a„/n exists or a„/n —►<».* 

(3) If any moment fails to converge, so will all moments of higher order. It 
is evident that more than one distribution can exist having a limited number 
of finite moments given and the remainder infinite. Thus we need only consider 
the case when moments of all orders exist. Furthermore, if any even moment 

exists the absolute moment of next lowest order must exist; for if £ x*’'dF 
exists, then each of dF and dF exist separately, each being positive. 
Hence f dF and f dF exist separately and thus f \ x*""* | dF = 

— x^”~^dF + j x^''~^dF exists. Hence we need only consider the case in 

which absolute moments of all orders exist. 

Theorem 1. A set of moments determines a distrdmtion uniquely if the series 

^ V f 

T! converges for some real nonrzero t. 
r-O r\ 

Consider the characteristic function 

0«) = £ 


'^'dF. 


This is uniformly continuous in t, and so are its derivatives of all orders. Thus 
we have, in the neighborhood of t = 0 the Maclaurin expansion 


r-o ^rlLar J<-o 


rl 


^ This proof is necessary to the use of limits in the following theorems, but Theorems 2 
and 3 are equally valid if lim is substituted for lim therein. It is not generally true that 
if On and hn are increasing monotonio sequences either lim an/bm exists or an/bn « as 
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Consequently, under the condition of the theorem, which implies that 2 ^ Mr 

rl 

is absolutely convergent for some radius p, ^(<) has a Taylor expansion in the 
neighborhood of the origin and is thus uniquely determined by the moments for 
t < p. Furthermore, in the neighborhood of < = <o we have 

^(t) = L + R. 

(4 < y 

The modulus of the coefficient of - — is not greater than r, . Therefore ^(0 

r! 

can be expanded in the neighborhood of t = <o in a Taylor series with a radius 
of convergence at least equal to p. Hence the function defining in the 
neighborhood of the origin can be continued analytically throughout the range 
— 00 to + * and is uniquely determined in that range. 

But the characteristic function unqiuely determines the distribution; and 
hence the theorem follows. 

As a result of Theorem 1 we have the following generalization of the criterion 
given by L4vy. 

Theorem 2. A set of moments completely determines a distribviion if lim ^^"/n 

n-*oo 

is finite. 

It has already been seen that imless becomes infinite the limit exists. 

Vrf 

By the Cauchy test for convergence the series 2 converges if 

rl 


\ n / 


As n—* 00 , (n!)‘^" tends, in accordance with Stirling’s theorem, to 
i.e. to n/e. Consequently the condition (7) becomes 
lim W^’‘/n] et < 1. 

Thus if lim v« "/" = h say, the inequality (7) is satisfied for t < l/{ek) and the 
theorem follows. 

An important corollary, which enables us to disregard the absolute moments 
(which may not be given if part of the range is negative) is 
Theorem 3. A set of moments uniquely determines a distribution if 
lim Msn^*"Vw fs finite. 


l/(In-I) ^ l/On) 
rjn-i S rjw 


l/(»n) 
M*n . 


Thus, 


^ i/(jn-i) ^ !•„ 2n 


1 . >/0"> 
SrMSn 
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and is therefore finite if the limit on the right is finite. Thus lim which 
cannot be greater than the greater of the two limits of ~ 1) and 

>'s»^*"V(2n), must be finite; and the theorem follows from Theorem 2. 

•0 1 

Now consider the series £ it? . Since the successive terms form a monotonic 

sequence it is a sufficient as well as a necessary condition for convergence that 
n/vl/“ tend to zero. Thus, if the series is divergent n/v'i" cannot tend to zero 
and so cannot become infinite. Hence it must tend to a finite limit, which 
may in particular be zero. Hence from Theorem 3 we get 

Theorem 4. A frequency distribution is uniquely determined by its moments if 
•0 1 

2 Ur diverges. 

r«-0 

Since l/vl'' is a decreasing sequence the series 2 converges or diverges 
with 2 The Carleman criterion, given by him for the case of limits 

± 00 , follows. For the case of limits 0 to oo the absolute moments are the same 
as the moments and the criterion can be the divergence of either 2 or 
2 1 Since Mr is greater than unity in the type of case under consideration 

the former series provides a more stringent test than that given by Carleman. 

At first sight it is rather surprising that the uniqueness of the distribution 
depends only on the behavior of the even moments, particularly when, by a 
simple extension of the above result, it is seen that a sufficient condition for 
uniqueness is the divergence of 2 1/ m 4«^"’ or 2 or any infinite subset 

chosen from the moments. It will, however, be remembered that the odd 
moments are conditioned to some extent by the even moments, and that unique- 
ness is really determined by the limiting form of I'n as n tends to infinity. 

It is evident that other tests may be derived from Theorem 1 by using the 
various tests for the convergence of an infinite series. For instance it is a suffi- 
cient condition for a set of moments to determine uniquely a distribution with 
positive range that 


- / 
n!/ 


M<i+i 

(n+l)I 


1 + 




where 


a > 1 
/9 > 0 


i.e. that 

( 8 ) 


Mn+l 


1 + 




T > 0. 


It may be noted in passing that the distribution 

dF » e"*dx 


0 < * ^ 00 , 


for which 


Mr (about origin) « r! 

is completely determined by its moments. In fact, by direct reference to 
Theorem 1 we see that the series 2 (t<)' converges for < < 1. 
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A frequency distribution of finite range is uniquely determined by its moments. 
For if the range is 0 to A we have 

xUF ^ A' 

and hence l/ih’' > IM so that the series 2 is divergent. 

A proof for the case when the frequency distribution is continuous has been 
given by L6vy, though on entirely different lines from the above. 

Theobem 5. A frequency diairibution of infinite range is uniquely determined 
by its moments if it tends to zero at the infinite terminals faster than e~*. 

Consider first of all the case when only one end of the range is infinite, so 
that we may take the range to be 0 to » . 

If {un/nlf” has a finite limit the distribution is unique, by Theorem 2. We 
have then only to consider the cases (if any) in which tends to infinity. 

It will be shown that in fact such cases do not occur. 

Given any (small) c there exists an X such that 



x>X 


where /(z) is the distribution. Thus 

(9) j /(z)z’’dz < « J c~*z’‘dz < enl. 

This is true for all n and X k independent of n. Now, 

J /(z)z" dx — j /(z)z“ dx + J /(*)*" dx. 

The first integral on the right k not greater than X“. The integral on the left 
tends, for large n, to something of greater order than nl, by our h3q>othesk, and 
hence to something of greater order than n". Thk k of greater order than X" 
(since X, however large, k independent of n) and consequently the second in- 
tegral on the right k ako of greater order than nl. But thk k contrary to 
equation (9). 

The case for the range which k infinite in both directions may be dealt with 
similarly. 

It k easily seen that the two examples of equations (1) and (2) do not tend 
to infinity faster than c“*. 

Except for the general result of Stieltjes, all the above criteria provide suffi- 
cient conditions, but whether the condition of Theorem 1 k ako necessary k 
not certain. An inquiry into the circumstances in which the moment-series 
of Theorem 1 does not converge throws some light on the question. 

It will be remembered that the characteristic function always exkts and k 
uniformly continuous in t. Since the mom^ts of all orders are assumed to exkt 
we always have 
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Thus, if 4>{t) can be expanded in an infinite Taylor series that series must be 
(itY 

Z ^ Hr . And if this series does not converge then 4>{t) cannot be expanded 
rl 

as an infinite Taylor series. But it can always be expanded in the finite form 
with remainder 


^(f) = £ ^ Mr + R- 
r-o rl 

Thus, when the series does not converge, ^(0 can be expanded in powers of t 
only asjTnptotically. 

Now it is known that there exist an infinite number of fimctions which have 
a given set of coeflicients in an asymptotic expansion; for instance, if has 
an asymptotic expansion in t the functions + xr‘°* ‘ all have the same 
expansion. It is therefore hardly surprising that when the conditions of 
Theorem 1 break down there can be more than one frequency distribution with 
the same set of moments. 

But it does not follow from what has been said that there must be more 
than one frequency distribution. There must be more than one function, but 
those functions may not qualify as frequency distributions, e.g. they may be 
negative in part of the range. In the example just given r‘®* * cannot be a 
characteristic function, for it does not obey the well-known condition that ^(0 
and should be conjugate. 

However, the question is more of mathematical than of statistical interest 
since the criteria provided above are likely to be adequate for the distributions 
encountered in practice. For example they establish the uhiqueness of the Pear- 
son curves (including the normal curve), the Poisson and the binomial. It 
would seem that distributions like those of equations (1) and (2) will appear 
only as statistical curiosities. 
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ON SAMPLES FROM A NORMAL BIVARUTB POPULATION 

Bt C, T. Hbu 


1. Introduction. In a number of papers written during the last ten years, 
J. Neyman and E. S. Pearson' have discussed certain general principles under* 
lying the choice of tests of statistical hypotheses. They have suggested that 
any formal treatment of the subject requires in the first place the specification 
of (t) the hypothesis to be tested, say /fo , {it) the admissible alternative hy- 
potheses. An appropriate test will then consist of a rule to be applied to ob- 
servational data, for rejecting Ho in such a way that (m) the risk of rejecting 
Ho when it is true is fixed at some desired value (e.g., 0.06 or 0.01), (iv) the risk 
of failing to reject Ho when some one of the admissible alternatives is true is 
kept as small as possible. With these general principles in mind, they have 
investigated how best the condition (iv) may be satisfied in different classes of 
problems. In many cases, though not in all, it has been foimd that the condi- 
tions are satisfied by the test obtained from the use of what has been termed 
the likelihood ratio, [9], [10], [14]. Once the problem has been specified, the 
test criterion is usually very easily found, although its sampling distribution, 
if Ho is true, often presents great difficulties. In the present paper, I propose 
to use this method to obtain appropriate tests for a number of hypotheses con- 
cerning two normally correlated variables. The investigation was suggested 
by a recent application of the method by W. A. Morgan [6] to a problem origin- 
ally discussed by D, J. Finney [3]. 


2. The hypotheses and the appropriate criteria. A sample of two variables 
Xi and Xs is supposed to have been drawn at random from a nonnal bivariate 
population, with the distribution 

{-2ra [C^J 

where ii, it, <ri, n, and pu are the population parameters. 

Morgan tested the hypothesis that the variances of the two variables are 
equal, i.e.. 


Ht: 


ffl = (Tj , 


* See bibliography at the end of the paper. 
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Other bjrpotheses that will be considered in the present paper are as follows: 

Hi Assuming <ri — ct] to test pu — po • 

Hi Assuming vi = vt ; to test €i = & . 

Hi To test simultaneously vi — at , mt = po . 

Hi To test simultaneously <ri = <rj , . 

Hi Assuming ai = at and = (t ; to test pa = po. 

Hi Assuming at, and pit = po ; to test = (t . 


Derivation of the criteria. I^et xu , xa be the measurements of the two char- 
acters on the ith individual of the sample, then the joint elementary probability 
law of the two sets of n observations E * (xu , xa, •••, x^; xn, xn, , 
Xtn) is 


i.»l ft, b . «, -.j - 




It will be convenient to denote by A, B, C, D, the following conditions of the 
population from which the sample is supposed to be drawn. 

(A) that stated in equation (1). 

(B) that stated in the equation for Hi , namely 

( 7 ^ s as being unspecified). 

(C) {i •= fe ” being imspecified). 

(D) Pu ’= Pi. 


Ne 3 mian and Pearson’s method affords a simple rule for obtaining appropriate 
test criteria once two sets of conditions have been defined. These are 

(а) the conditions which can be assumed to be satisfied in any case, and 

(б) the conditions which are satisfied if the hypothesis to be tested is true. 

The conditions (a) define a class Q of admissible populations, and the condi- 
tions (b) define a sub-class w of {2 to which the population must belong if the 
hypothesis tested be true. 

The maximum value of p{E 1 {i , , <ri , ai , pu) when the parameters vary in 

such a way that the population sampled always belongs to Q, is called p(Q max.). 
The maximum value when the population is restricted to w is called p(« max.). 
The likelihood ratio for testing the hypothesis specifying the subset a has been 
defined to be 

\ =: P («> max.) 
p (Q max.)* 


( 3 ) 
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It will be seen that 1 < X < 0. By referring X, or a monotonic function of X, 
to its sampling distribution when the hypothesis tested is true, we obtain a 
scale on which to assess our judgment of the truth of the hypothesis tested. 

For each of the hypotheses Hi io Hi of (3) can be found. However, we 
shall use a more convenient criterion. 


(4) 


L = X*'" 


which is a monotonic function of X. 

Thus the respective test criteria are found to be : 
For Hi : 


(5) 


4sfs|(l - r!,) 

{si + 8t)\l - R\) 


where Ri 
For Hi : 

( 6 ) 

For Hi : 

(7) 

ForH4: 

( 8 ) 

ForH*: 

(9) 

FotH,: 

(10) 


2rM8i8» . 


8* + «J 


is the estimate of pu when ai and (r» are assumed to be equal. 


(1 — po)(l ~ Ri) 


(1 - poRiY 


I, = 1 /(i + y' — }• 

' ( «i + «j — 2ri88i8»J 

, 4(1 - pDalslil - r*) 


I<i= 


{si + sim - poRo’ 
48 ! 4(1 - rlt) 


= LiX Li. 


{«! + 4 + i(^i ~ ^)*)(l ~ Ri) 

(1 - p;)(i - Ri) 


— Li X Lt. 


(1 - pfR,)* 


where R* = is the estimate of pu when both the <r’s and 

«! + «! + i(«i - «*) 

the (’s are assumed to be equal. 

For Hj : 


(II) I, - 1 /|i + a + «»)»■ - 1 

/ I 2(8i — 2poru8i8i + 8j)J 

The different h 3 q)othe 8 es are also given in Table V, at the end of this paper, 
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together with the conditions defining sets of fi and u, and the appropriate likeli- 
hood criteria. 

To complete the solution we must find the distributions of L or some mono- 
tonic function of L in each case when the hypothesis tested is true, in order to 
assess the significance of an observed value of L. 


3. The distributions of the criteria. In order to simplify the problem of 
finding the distributions of the criteria, consider the following transformation: 

* 1 .- = {Xi - Yt)/y/2 

(12) 

*« = (X< + F<)/\/2. 

It is clear that in view of (1) X and Y will be two normally correlated variables. 
We shall denote this property by A' corresponding to A. The conditions B', 
C, D' corresponding to B, C, D respectively are as follows: 

B': pzr = 0, 


ix = 0 , 


(when pxr 


where 


(13) 70 = 1-^. 

1 — po 

Thus we have the equivalent hypotheses Hi, H% ••• H'l corresponding to 
Hi, Ht, Hi . The likelihood ratios L[, Lt Li may be determined in 
the same way as before, and, in view of the transformation (12), it will be 
seen that they are equal to Li , Lt Li respectively. 

The tests of the hypotheses Hi, Hi, Hi are now seen to be well known. 
The test of Hi : pxr = 0 is the test for significance of a correlation coefficient, 
and the criterion Li becomes 

(14) Li = Xj'," = 1 - rir . 

This test has been dealt with by Morgan [6] and Pitman [15], and has been 
referred to above. 

The test of Hi : a\lo\ = 7o toAen pxr *= 0 can be treated as an extension 
of Fisher’s 2 -test [5], since 70 is specified. If we write 

^ 1 + St + gj + 2ritSiai 

S*x 1 — Ri 8* 4- s* — 2rii8i8i 

the test criterion Lt of (6) may be written 


^ ^ 7o(l + m/to)** 

It is well known that if Hi is true, then 
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and the test appropriate to Ht and therefore of Ht is the associated 2 -te 8 t (2 = 
i log u/ 70 ) with degrees of freedom /i = /* = n — 1. It may be easily shown 
that the two values of u cutting off equal tail areas from the distribution p(u) 
will correspond to a single value of Lj . 

The test of Hi : ix = 0 when pxr = 0 is in the form of “Student’s” t test. 
If we write 

t* fe-is)* 

V**/ 1 2 ”2 I 2 

w 1 Sx + S 2 — 2ri29ifi2 


it follows that the test criterion Lz of (12) may be written 

( 19 ) 

But it is well known that if tx = 0, then 

<“> ”<'> “ vy^--iBiu(»-i)i (‘ + "-^r- 

The 5% or 1% points of significance of t may be obtained from Fisher’s <-table 
[5] with degrees of freedom / = n — 1 . 

The tests of Hi and Ht . We infer from (14), (16) and (19) that Li is a func- 
tion of rxr , Li a function of Sj- and Sx , and Lt a function of X and Sx . It is 
clear that if rxr is distributed independently of Sx and Sr , then Li and Lt are 
independent, i.e., 


( 21 ) 


p(Li , Li) — p{Li)p{Li) 


and that if rxr is distributed independently of X and Sx , then Li and Lj are 
independent, i.e., 

(22) p(Li , Lt) = p(Li)p(Li). 

It is known that X, F are independent o( Sx , Sr , rxr and in addition that 
rxr is distributed independently of Sx , Sy if pxr = 0. Therefore, if Hi is 
true, then the relations (21) and (22) hold. Hence, knowing p(Li) and p(I/t), 
a very simple transformation and integration gives p(Li). Similarly, the dis- 
tribution of Lt may be readily derived from those of In and Lt . 

But from the distribution of rxr when pxr = 0, by transformation (14), the 
distribution of In assuming H'l true is found to be 


- BH(n- 2). 

If Hi is true, from (17), by transformation (16) we have 
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Again, if fft is true, from (20), by transformation (19), we have 

which is the same as the distribution of Lt • Therefore by comparing (21) and 
(22) we see that the distribution of Li when Hi is true will be exactly the same 
as that of La when Hi is true. We shall therefore confine ourselves to the 
problem of obtaining the distribution of Lt from those of Li and Lt . 

Now 

- B|}(n - 2), ilBlKn - ■), -W' 

Applying the transformation 

Li — Li Lt 

(27) 

Z - Lt 

and integrating with respect to Z from 0 to 1, we obtain 

(28) pdO = - 2)11'"-"’, 0 < L« < 1. 

Thus we can construct the values of Li at the 5% and 1% levels for different 
values of n as given in Table I. 


TABLE I 

5% and 1% values of Li (or Lt) 


n 

6% 

1% 

6 

.1357 

.0464 

6 

.2509 

.1000 

7 

,3017 

.1585 

8 

.3684 

.2154 

9 

.4249 

.2683 

10 

.4729 

.3162 

12 

.5493 

.3981 

15 

.6307 

.4924 

20 

.7169 

.5995 

24 

.7616 

.6579 

30 

.8074 

.7197 

40 

.8541 

.7848 

60 

.9019 

.8532 

120 


.9249 

00 


1.0000 
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The test of Ht . In the case of testing = to^x), assuming pxy and 

Px each to be zero, the likelihood estimate of c][ becomes 2X*/n or Si + 

The distribution of this quantity is the same' as that of Si but with degrees of 
freedom n instead of n — 1. Therefore, by analogy with the previous result 
(17) used in testing Hi , if we write 

«Si _ si _ 1 + f?i 
^ ’ '' sx* sl + r i-Rt 

then the likelihood criterion of Ht becomes 




(31) p(, (, + £) . 

Hence the test appropriate to Ht is the associated z-test z = J log|_^ j ^ 

with fi — n — 1, /* = n. We can use the z-table as before. 

The test of Ht . Here we test whether fx = 0. It may be seen that Ln is 
a function of + >oSi). Further, if we assume that pxv = 0 and also 

that ffi = 7oSi , then it will follow that S(X — and - 2(7 — ?)* are each 

7o 

distributed independently as x*<»’i with n — 1 degrees of freedom; and hence 
their sum is distributed as xVi with 2n — 2 degrees of freedom. Also if {x = 0 
(and Hi is true) X will be distributed normally about zero with standard error 
<rxiy/» . Hence we may write 

T. = I -* ^ ^ 


2n - 2/ 


where 


/ A Mx - X)» + 2(7 - f)V7o 
/ y n(2n - 2) 


and is distributed in accordance with “Student’s” distribution with 2n — 2 
degrees of freedom, 


pih) - 


-2)]V ^2n-2j 


V2^2B[i §(2n - 2)1 V ' 2n- 2/ 

In terms of original variables 

/gg\ tj _ ygX^ _ ( 1 + Po)(^i ~ f>)* 

2n — 2 yaSx + Sr 2(af — 2poru «i «i + «*) 
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4. Comptiiion of the Ai-test and Ai-test with the ru-teet in cases nhere Ht 

and Ht are true respectively. It will be noted that in the preceding discossion 
we have been concerned with three different tests of the hypothesis that pu 
has some specified value po • When there is no information available regarding 
the means and standard deviations of Xi and xt , the test is based on the sampling 
distribution of the ordinary product-moment coefficient ru . If it may be as- 
sumed that O’! >= <Ti , then we have the estimate 


Ri 


2rtt8i8t 
8 * + 8 * 


If besides vi = a», it may also be assumed that h ^ it, then we have the 
estimate 

D _ 2ru8i83 — - iiY 

«! + «! + - f»)*‘ 


From the point of view of testing hypotheses, all these criteria ru,Ri,Rt 
follow from the application of the likelihood ratio method. It will be noted 
that if <ri = <rj , either the ra or the /?i test may be used. But, insofar as the 
likelihood principle is accepted, the latter should be regarded as the “better” 
test. Again, if si = v* and = {» , all three tests may be used, but that based 
on Ri will be the “best”. A question of interest is to investigate just what is 
meant by the “better” or the “best” test. We may ask how far the improve- 
ments are sufficient to justify the use of the Ri and Rt tests in place of the more 
generally used n* test. One method of comparison is to examine what Neyman 
and Pearson [12] have termed the “power function” of the tests. 

For example, when testing the hypothesis that a parameter 0 has the value 
^0 in the population sampled, the power of the test criterion T with regard to 
the alternative hypothesis that is givfen by the expression = 

P{T > Ta\0 = ®il where T{ is the value of T at the level of significance a. 
This quantity &{0) measures the chance that the test as specified will detect 
the fact that 0 = 0t, i.e., the chance of rejecting the hypothesis when it is not 
true. A test whose power function is never less than that of any other test is 
termed the uniformly most powerful test. 

If the permissible alternative hypotheses to =» So are both 0 < 0t and 0> 0t, 
then the power of the test T is given by the expression 

M) = <t <T':\0i\ 

where Ta and T'a are the values of T at both ends of the distribution at the 
level of the significance a. When the test is such that the power function has 
a minimum value a at 0 = , it is said to be unbiased. 

A test is termed biased if, for certain alternative hypotheses 0 ^ 0o, the chance 
of rejecting the hypothesis 0 >= is less than the chance of rejecting this hy- 
pothesis when it is true. 
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In what follows it is proposed to compare the power functions of the tests 
based on ru , Ri, and Bt in order to obtain more complete evidence of the 
extent to which one is “better” than the other. 

The diatihutim of Rif We have obtained the distribution of n when Ht and 
therefore Ht is true. We are now able to find the distribution of Ri by apply- 
ing the transformation of (15). Thus the distribution of Ri in terms of po is 


(36) 


(l-P?) (l-fii)*'-* 

2»-«B[Kn - 1), Un - 1)] (1 - poRi)-*' 


The significance of Ri may be assessed by the z-test, where we take 


(37) 


Z - 


1 1 u 1 , 1+^1 

-log.- = -logj-^^ 

-z' -I, say 



1 + />0 
1 — po 


with degrees of freedom f\ — ft — n — \. R. A. Fisher’s z-table may be used 
in this connection. 

When Pm = 0, the distribution simplifies to 


(38) 


I'” - »> - a. =.-Bn(;,--T)r i (¥^-T)- i « - 

(1 - «?)*<"-* 


B[i(n-l),i] 


since 2*"“* B(i(n — 1), |(n — 1)] is equal to B[i(n — 1), i] by duplication for- 
mula [16, p. 240]. 

The distribution (38) is similar in form to that of p(rM j pm = 0) with n — 1 
degrees of freedom instead of n — 2. The significance levels of Ri may then 
be obtained directly from the r-table [1] for the case pm = 0, entering with 
degrees of freedom n — 1. 

The distribution of Rt . The distribution of Rt may be obtained from that of 
V when Hi and therefore Ht is true. It is 


(39) p(i2j|pM = Po) 


(1 + po)*"(l - po)*<’-» (1 -I- )<<"-•>(! - 

2"-*B[i(n - 1), in] (1 - poRj)"-* 


This agrees with the result first obtained by R. A. Fisher [4] by a different 
method. The significance of Rt may be assessed by the z-test, where we take 


(40) 



* Since finding the distribution of Ri (36), (38) and the relation between 1Z| and t' (37), 
my attention has been drawn to a recent paper by DeLury [2] in which the same results 
are obtained. Since my method of derivation is different from his, I have thought it 
worthwhile to retain it here. 
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with degrees of freedom /i = n — l,ft = n. The tables for use with the s*test 
may be used in this connection. 

When pu = 0, the distribution is simplified to 

(41) pcftip,, - 0) - (1 + iw-'-a - «•-> 

which is simply a Pearson Type I curve. 

Pomr functions of Ri and Rt . In order to find the power functions of Ri and 
Rt with respect to alternative hypotheses Ht to Ht , specifying pa ^ pt < po, 
it will be convenient to consider the incomplete beta function distributions 


(42) 

(43) 
where *1 = 


p(xi) = 
p(x>) = 


u 


B[i(n - 1), §(n 
1 

B[i(n - 1), in] 
and xt — 


1)1 


a;}<«-»)(l - 


- x»)*‘""* 

From the Tables of the In- 


7o(l + w/70) * 7o(l + P/70)* 

complete Beta Function [13] we can find the values of xi and Xa at the significance 
level a, i.e. 


(44) Z,i Ii(n - 1), i(n - 1)] * a', 

(45) /,j [i(n — 1), in] = a\ 

The values of Ri(a), and of Ri(a), may then be calculated from the relations 


(46) 


u — 1 _ — 1 -f* Xi -j- ypXi 

tt + 1 1 — Xi + 70 X 1 * 


(47) 


~ 1 _ — 1 -1* X» + 7oXt 

» + 1 1 — Xa + yoXi 


The power functions of Rt and Rt thus found may be given as follows: 

(48) i8'(p4|/2.) -P|/ei<B;(«)lp4}, 

(49) /3'(p<|«a) = P{«»<«i(a)|p«}. 

In the same way, for any alternative hypothesis ffi specifying pu = pt > pt, 
we can find the values of xi and xi at the significance level a", at the other end 
of the distribution, i.e. 


(50) l-/.i'[i(n-l),i(n-l)] = a", 

(51) 1 -/^.[i(n-l),in] -a". 

Thence the corresponding values of Ri(a) and R$(a) may be obtained, and their 
power functions are 


(52) 


I3"(pt I Ri) * P{Ri > Ri(a) I p,}. 



420 


C. T. HBU 


<63) |8"(P, I Ri) - P{R, > Ri'(a) | p,). 

The power functions of ffi and Ra with respect to alternative hypotheses speci- 
fying pis = p< < po and > po may now be obtained by adding (48) and (52) or 
(49) and (53) or, more simply, 

(54) /5(p, I fti) = 1 - P{/2;(«) < ft, < Ri(a) I p«}, 

(55) ^(p, 1 ft,) = 1 - P{fti(a) < ft, < fti'(a) | p,} 

where fti(a), ftr(«); fti(a), Rt(a) are the values of fti and ft, at the two ends 
of the distribution at the significance level a = a' a". 

In view of the fact that after transformation the tests based on ft, and ft, 
are equivalent to tests regarding the equality of variances, it follows from Ney- 
man and Pearson’s work [11] regarding the uniformly most powerful test of the 
hypothesis that <r* Ax = 7o , with alternatives a\lc\ = yt < yo (or yt > 7o), 
that: (1) if <ri = Vs and alternative to pi, = <ro are that p,, = p, < po (or, in a 
second case, pt > po) the test based on fti is the uniformly most powerful test, 
i.e., it is more powerful than that based on ra ; and (2) if v, = a, and fi *= {, , 
then the test based on Ra is the uniformly most powerful test, i.e., it is more 
powerful than thase based on either ri, or ft, . 

For illustration, let us take a special case, say 


(a) n = 10, PO = 0.6, a' = a" = 0.025. 

From the tables, we obtain the values 

xi = .198902 x'a = .184863 

xi = .801098 xa = .772916 


and by calculation the values 

R[(a) = -.0034 
ft"(a) = .8831 


fti(a) = -.0487 
fti'(a) = .8632. 


The values of the power functions of fti and ft, for specified values of p, have 
been calculated and are given in Table II. For pi < po , a comparison of 
columns 2 and 4 will show that the test based on ft, is uniformly more powerful 
than that based on ft, (or for p, > po , a comparison of columns 3 and 5). 

The unbiased test of Ha and Ht . When however the alternatives are that 
Pu = Pt < Po , and Pt > Po, questions of bias may be introduced. 

In the case of ft, , i.e. when ft, is used, it was established by J. Nejunan in 
his lecture courses [8], that if we test whether <r* Ax =* To , where the alternatives 
are yt < To and t< > To , and if the samples of X and Y are of equal size, then 
the test based on cutting off equal tail areas of the distribution of x, is unbiased 
and of the type B [7]. Therefore the same may be said of the fti-test. 

In the case of ft, , the equivalent transformed test is again whether <rVvx == 
To • But the test now corresponds to that in which an estimate of a\ is based 
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on /i = n — 1 degrees of freedom and an estimate of ai on ft — n degrees of 
freedom. The degrees of freedom not being equal, it is known that if equal 
tail areas are cut off from the sampling distribution of xt , this test will be 
biased. Neyman’s result [8] shows that if the lower and upper significance 
levels are taken at xt and Xt, then the equation 

(56) - *iy* - - xiy* 

should be satisfied if the test is unbiased. Since in the present case, with the 
test based on equal tail area critical region, the bias will be very small, the 
rejection levels Rtia) and Rtia) in the numerical investigation given in Table 
III have been selected taking equal tail areas for simplicity. 

TABLE II 


Valv£8 of the power functions of Ri and Rt with respect to altemative hypotheses 

pu = p< < po or pi > po 
(n = 10; po = 0.6; a' = = 0.025) 


Pi 

fl'(p.lfii) 

^*(p.|«») 

/s'(pi|£i) 


-0.8 

.9984 




-0.6 

.9739 




-0.4 

.9867 




-0.2 

.7189 


.7360 


0.0 

.4960 

.0002 

.5093 

.0001 

0.2 

.2744 

.0008 

.2809 

.0006 

0.3 

.1825 

.0018 

.1860 

.0015 

0.4 

.1106 

.0042 

.1111 

.0037 

0.5 

.0676 

.0099 

.0580 

.0093 

0.6 

.025 

.025 

.025 

.025 

0.7 

.0081 

.0678 

.0080 

.0720 

0.8 

.0015 

.1995 

.0015 

.2150 

0.9 

.0001 

.5950 

.0001 

.6289 

0.95 


.8979 


.9150 

0.976 


.9866 


.9897 


If we now take a special case, similar to (a) above, but taking equal tail areas, 
so that 

n “ 10 p * 0.6 
a = 0.6 (o' - o" = io) 

we can obtain the values of x’b and of R’b as before. 

The values of the power fimctions of Ri and Rt for specified values of pi are 
given in columns 3 and 4 of Table III. These values are equivalent to the 
sums of the corresponding values in Table II. The values of the power func- 
tions of Ri and Rt for the following additional cases are also given in Table III; 
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(5) 

n = 10 

(c) 

n = 20 

id) 

n = 20 
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Po = 0.8 a = 0.05 

po = 0.6 a = 0.05 

Po = 0.8 a = 0.05. 


Comparison of the power functions. We may now deal with the question 
raised at the beginning of this section, namely, as to what is meant by the 
“better” or “best” test. We shall proceed to compare for certain special cases 
the power functions of the three test, all of which are applicable where it may 
be assumed that <ri = <r* , {i = . 

In the first place it will be noted that the power function of the test based on 
equal tail areas of the ru distribution is 

(57) j9(p« I ru) = 1 - p ('/«(«) < 7i2 < yit{a) | p«) 

where 

pirn I pu = Po) drn = 

( 68 ) ^ ‘ 

P{ru > r(j(a) 1 Pol =* / p(Tn | put = po) dru =* Ja 

and 


(59) pirn \ pu 


Po) = 


(1 - pj)*‘"“*’ . _ 0 . / d y* cos-* (-Pont) 
rrmn-DV W/ V(l-P?rf,) 


The probability that ru is less than some specified value may be obtained from 
Tables of the CorrekUion Coefficient (F. N. David, [1]), or, where these are not 
sufficiently detailed, by using R. A. Fisher’s z'-transformation for ru [4]. 

The cases considered are (a), (6), (c), (d) as defined above. The power 
functions of the three different tests (all based upon the equal tail areas of their 
distributions) are given in Table III. The figures for ru in the brackets are 
those obtained by the z'-transformation approximation. 

An examination of Tables II and III brings out the following points: 

(1) For reasons given above, the Rj test based on equal tail area critical 
regions is very slightly biased; the amount of this bias for the case n >= 10, 
Po = 0.6, a = 0.05 is shown in Table IV. This shows that the power of the i2» 
test is less than 0.05 in the fifth or sixth decimal places for 0.59 < pt < 0.60. 
As a result this test is very slightly less powerful than the other two tests for 
alternatives with pt slightly less than po . The effect is, however, of little im- 
portance. 

(2) Except in this short range of pt , we find that 

j3(p« 1 Rt) > j8(p< I Ri) > j8(p« 1 ru). 



TABLE III 

Comparison of the power functions of r^ , Ri , and Rt tests with respect to altemative hypotheses 
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QO 

d 

11 

S 

U 

e 

<0. 

.9966 

.9698 

.8254 

.6520 

.4085 

.1635 

.0500 

.3604 

.8844 

.9960 

.5459 

.9101 

5| 

.9959 

.9663 

.8170 

.6432 

.4011 

.1617 

.0500 

.3493 

.5871 

.9947 

.5613 

.9158 

I 

j 

.9952 

.9624 

.8062 

.6309 

.3920 

.1589 

.0500 

.3272 

(.3270) 

(.8547) 

(.9944) 

.5671 

.9222 

n = 20 p. = 0.6 

q5 

a 

X 

.9973 

.9698 

.8449 

.5534 

.2061 

.0922 

.0500 

.1147 

.4134 

.9181 

.9978 

.2041 

.8084 

>3 

<10. 

.9967 

.9663 

.8369 

.5456 

.2036 

.0917 

.0500 

.1119 

.4010 

.9106 

.9974 

.2253 

.8201 

M 

c 

a 

's:' 

.9965 

.9648 

.8328 

.5412 

.2026 

.0915 

.0500 

.1096 

.3886 

.9034 

(.9004) 

(.9974) 

.2289 

.8300 

GO 

d 

li 

C 

o 

li 

ss 

'w' 

<0. 

.9921 

.9650 

.8909 

.7360 

.4877 

.3427 

.2047 

.0971 

.0500 

.1904 

.5763 

.8896 

.9938 

8986* 

SZfS* 

1 

4 

v 3 

.9891 

.9569 

.8766 

.7189 

.4750 

.3345 

.2010 

.0965 

.0500 

.1771 

.5426 

.8692 

.9908 

.3817 

.9463 

c 

a 

.9887 

.9557 

.9742 

.7158 

.4727 

.3330 

.2005 

.0969 

.0500 

.1466 

(.14^) 

(.4689) 

(.8134) 

(.9845) 

.4004 

.9574 

C0 

d 

II 

o 

f«< 

U 

>5 

<*0. 

.9807 

.9005 

.7360 

.5094 

.2815 

.1148 

.0673 

.0500 

.0800 

.2165 

.6290 

.9150 

.9897 

-.0487 

.8632 

a 

'w' 

<*x 

.9739 

.8867 

.7189 

.4962 

.2752 

.1148 

.0675 

.0500 

.0759 

.2010 

.5951 

.8979 

.9866 

-.0034 

.8831 

2 

<10. 

.9739 

.8865 

.7186 

.4960 

.2753 

.1142 

.0679 

.0500 

.0735 

.1890 

.5656 

(.5569) 

(.8709) 

(.9822) 

8668* 
6800 - 

a 

•lO l>* O 

dodddddoood ddd 

I 1 1 

Levels 
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That is to say, the power function of the Rt test never lies below those of the 
Ri and ru tests, and that of the Ri test never lies below that of the ru test. 

(3) The gain in sensitivity as measured by the chance that the test will 
detect that p< ^ Po is, however, very small. Further, Ri may only be used if 
it is known that vi «= at and Ai if it is known in addition that . It will 

only be in rather special problems that the statistician can feel confident that 
such assumptions are justified. We will therefore probably prefer the test based 
on the ordinary product moment correlation coefficient ru , since the slight loss 
in power will be felt to be outweighed by the gain in simplicity. It is, however, 
only after an objective comparison of the consequences of applying the three 
tests that a definite opinion on these points can be reached. 


TABLE IV 


p< 




0.5 

.0580 

.0093 

.0673 

0.590 

.0274235 

.0225806 

.0500041 

0.591 

.0271778 

.0228190 

.0499968 

0.592 

.0269359 

.0230578 

.0499937 

0.593 

.0266934 

.0232976 

.0499910 

0.594 

.0264515 

.0235337 

.0499852 

0.595 

.0262096 

.0237798 

.0499894 

0.596 

.0259677 

.0240222 

.0499899 

0.597 

.0257257 

.0242651 

.0499908 

0.598 

.0254838 

.0245107 

.0499945 

0.599 

.0252419 

.0247540 

.0499959 

0.6 

.025 

.025 

.05 


6. Summary. Various h 3 rpotheses relating to a population of two normal 
correlated variates have been considered and the appropriate test criteria for 
each hypothesis have been derived by the likelihood ratio method. The dis- 
tributions of the likelihood ratio criteria or of monotonic functions of them have 
been obtained with the aid of transformation (14). References have been given 
to tables from which significance levels for use in conjimction with the tests 
may be obtained; a new table of significance levels for the tests of Hi and Ht 
was given. 

The power functions of ru , Ri and Rt have been compared; from these power 
functions it was concluded that Ri and Rt are suitable respectively for testing 
the hypothesis when ai = at and when, in addition, . 

In conclusion, I should like to express my indebtedness to Professor E. S. 
Pearson for oontmued advice and help in the preparation of this paper, to Dr. 
A. Wald and Professor S. S. Wilks for valuable suggestions. 





TABLE V 

Conditions defining Q and w together with the likelihood criteria appropriate for testing the hypotheses Hi 
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ON A LEAST SQtrASES ADJUSTMENT OF A SAMPLED FREQUENCY 
TABLE WHEN THE EXPECTED MARGINAL TOTALS ARE KNOWN 

Bt W. Edwards Dsiima Ain> Fbbdbbick F. Stephan 

1. Litroduction. There are situations in sampling wherein the data fur- 
nished by the sample must be adjusted for consistency with data obtained from 
other sources or with deductions from established theory. For example, in the 
1940 census of population a problem of adjustment arises from the fact that 
although there will be a complete count of certain characters for the individuals 
in the population, considerations of efficiency will limit to a sample many of 
the cross-tabulations (joint distributions) of these characters. The tabulations 
of the sample will be used to estimate the results that would have been obtained 
from cross-tabulations of the entire population.* The situation is shown in 
Fig. 1 in parallel tables for the universe and for the sample. For the universe 
the marginal totals Ni. and N.j are known, but not the cell frequencies ; 
for the sample, however, tabulation gives both the cell frequencies ni,- and the 
marginal totals rii. and n.,- . 

In estimating any cell frequency of the universe, such Nn , three possi- 
bilities present themselves; from the sample one may make an estimate from 
the tth row alone, another from the Jth column alone, and still another from the 
over-all ratio ni,-/n: specifically, the three estimates would be nuNi./ni . , 
TiijN. i/n.i , and ntyiV /n. As a result of sampling errors these will not be identical 
except by accident, and though any of them by itself may be considered ac- 
curate enough, still, if the whole r X s table of universe cell frequencies were so 
estimated, the marginal totals would not come out right. In this paper we 
present a rapid method of adjustment, which in effect combines all three of the 
estimates just mentioned, and at the same time enforces agreement with the 
marginal totals. The method is extended to varying degrees of cross-tabulation 
in three dimensions. 

In any problem of adjustment where the conditions are intricate it is neces- 
sary to have a method that is straight-forward and self-checking; this becomes 
imperative when we realize that in the three-dimensional Case VII of the 
problem now at hand (tade irtfra), any adjustment in one cell must be balanced 
by adjustments in at least seven others. The method of least squares is one 
possible procedure for effecting an adjustment and at the same time enforcing 
certain conditions among the marginal totals. It is essentially a scheme for 

> Examples will occur in the 1940 census publications. Further discussion of this prob- 
lem and of the sampling procedure is ^ven by the authors in “The sampling procedure 
of the 1940 population census,” Jour. Am. Slat. Aun., Vol. 36 (1940), pp. 615-630. 
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arriving at a set of calculated or adjusted observations that will satisfy the 
conditions of the problem, and at the same time minimise the sum of 
the weighted squares of the residuals, symbolized as 

(1) jS = S w{ne — no)* 

rtc and no being the calculated and observed numbers in a cell, and — no the 
corresponding residual. It is the nature of the conditions imposed on the ad- 
justed values that distinguishes one type of problem from another. Least 
squares has the practical advantage of uniqueness, once the weights of the ob- 
servations have been assigned, and it possesses the theoretical dignity of giving 
one kind of “best’^ estimates under ideal conditions of sampling. For our 
present purpose we shall minimize sums of the form 

(2) S = S(m< - n,)Vn.* 

Ui being the observed frequency in the fth cell, and mi the calculated or adjusted 
frequency therein. The conditions among the m, will arise from the fact that 
the marginal totals, after adjustment, must agree with their expected values, 
namely, the deflated marginal totals of the universe (for example, and m.j as 
defined in eqs. (6) and (7)). 

By definition, weight and variance are inversely proportional, hence the 
principle of least squares is identical with the minimizing of chi-square. Here 
the variance in the ith cell is v<(l — Vi/n), where Vi is the expected number in 
that cell, and n is the total number in the sample. Now if Vi is suflBciently 
well approximated by Ui , it follows that if no cell contains an appreciable 
fraction of the whole sample (a circumstance requiring a fair sized number of 
cells— perhaps 100), the variance may be taken as v, for every i, and the mini- 
mized S can be used as chi-square. But regardless of the number of cells, if 
the n< be not too much different from one another, so that the factor 1 — Vi/n 
may be treated as a constant, we still get the least squares solution by minimiz- 
ing S as defined in eq. (2). 

2. The two dimensional problem. Suppose that the data on two character- 
istics (e.g. age and highest grade of school completed) are obtained for each 
member of a universe of N individuals, and that tabulations of the data provide 
either (a) one set of marginal totals Ni . , N %, , • • • , Nr. ; or (6) in addition, the 
marginal totals N.i , N ,% , • • • , N ,^ . The nature of the tabulations is presumed 
such that it is not feasible to count the numbers Na in the cells, as would be 
done if one character were crossed with the other. Suppose, however, that for 
a sample of n individuals selected in a random manner from the universe, the 
two characters are crossed with each other, so that we know not only all the 
8 + r marginal totals n.i , • • • , nr. of the sample but also the numbers n<y 
(i = 1, 2, . • , r; j *= 1, 2, • • • , s). The problem is to estimate the unknown 
frequencies in the cells of the universe. This will be done by finding the 
calculated or adjusted sample frequencies and then inflating them by the 
inverse sampling ratio N/n. 



A LZA«r BQTTAIUBS ADJtTBTUBIKT 


429 


For the least squares solution we seek those values of mu that minimise* 

(3) S “ — nijf/nt) 

wherein the m^ are subjected to one of the following sets of conditions: 

Cow I : One eet of marginal totals known. Assume Ni. , Nt. , •• • , Nr. to be 
known. Then we require 

(4) ^mn^mi., t «> 1, 2, . . . , r. 

i 

These r equations constitute r conditions on the adjusted m<y . 


UNIVERSE 


SAMPLE 

J’ 


!•/ 

Nu 

Na 


N,s 

N,. 

n„ 

n,t 


/ 7/5 

n,. 


Nzi 

Nu 


Ngs 

Nt. 

nui 

ngg 


Ngs 

nt. 




Nu 

: ‘ • 

Ni. 



nu 

L ... _ 


Hi. 

r 

Nr, 

Nn 


Nrs 

Nr. 

Nr, 

nrt 


Prs 

nr. 


fi, 

Ng 

... Nj ... 

N.S 

N 

n., 

n.g 

— n.'j ... 

n.s n 


Nil unknown nn known 

Marginal totals N.i and Ni. known Marginal totals n.i and m. known 

N known n known 

Fio. 1. Showing tbb Ststbm of Notation fob the Cell Frequencies and Marginal 
Totals or the Universe and the Sample in the Two Dimensional Problem 

Case II: Both seta of marginal totals known. Here the adjusted cell frequencies 
must satisfy not only condition (4) but also 

(6) == m.i j = 1, 2, • . . , 8 — 1 

i 

there being now a total of r + « — 1 conditions. In both cases, 

(6) mi. = Ni.n/N, 

(7) m.i - N.{n/N. 

In other words, mt. and m.j are the deflated marginal totals, i.e., Ni. and AT.y 
divided by the actual sampling ratio N /n. The m,-. and m. y are not independent, 
for 


* The sign ^ will denote summation over all possible cells, unless otherwise noted. 
T*. will denote summation over all values of i, and similarly for an inferior j or k. The 

i 

dot, as in n.y , will signify the result of summing the mi over all values of i in the yth 
column. 
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( 8 ) N.I + N.2+ ■■■ +N..^ Nt.+N,. + +Nr.~ N. 

It is for this reason that if i runs through all r values in eq. (4), then j can run 
through only « — 1 in eq. (5). A similar equation also exists for the marginal 
totals of the sample, namely, 

(9) n,i -|- n.i +•••■!- n., — tii, -1- n*. + • • • 4" fW. “ w. 

Solution of the two dimensional Case I. Assuming that the adjusted values 
of the mu have been found, let each take on a small variation Sm</ ; then the 
differentials of eqs. (3) and (4) show that 

(10) i5/S = S{(m,/ — no)/n,y}8m<, = 0 (one equation), 

(11) X = 0, i = 1, 2, . . . , r (r equations). 

i 

Multiply now eq. (lit) by the arbitrary Lagrange multiplier — X<. , and add eqs. 
(10) and (11) to obtain 

(12) 2{(m</ — n<,)/n.-/ — \i.\dmij = 0. (one equation). 

By the usual argument, one may now set each brace equal to zero, recognizing 
that the r Lagrange multipliers are then no longer arbitrary but must satisfy 
the relation 

(13) wii,- = nij{l + X<.). 

The adjusted frequencies m,/ can be computed at once as soon as the X,'. are 
found. To evaluate them one may rewrite the conditions (4) using the right- 
hand member of (13) for mu , obtaining 

(14) mi. = ni.(l + hi.)- 

Another way to arrive at this same relation is to sum each member of eq. (13) 
in the fth row. However obtained X,-. is now known, since mi. and n,-. are 
known, and in fact eq. (13) now gives 

(15) mu = nii(mi./ni.). 

The adjustment is thus a simple proportionate one by rows, the cells in any one 
row all being raised or lowered by the proportionate adjustment in the row total. 
Case I thus amounts to r independent one dimensional proportionate adjust- 
ments, one for each row, and any one or all may be carried out, as desired. 
This result can be obtained by a simpler approach but is presented in this way 
for consistency with later cases. 

The minimized sum of squares may be computed directly, or from the row 
totals by seeing that 

(16) 5 = S (vu. — n<.)V«f. • 

The term (w<. — ni.Y/nt, for the tth row may be considered separately, and 
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used as x’ with s — 1 degrees of freedom, or all rows may be combined into 
the minimized 8 as given in eq. (16), and used as x* with r(« — 1) degrees of 
freedom. 

SoliUion of the two dimensional Case II. In addition to eqs. (11) we now 
have also 

(17) £ dniij = 0 j * 1, 2, — 1 

i 

which comes by differentiating eqs. (5). By addition of eqs. (10), (11), and 

(17) , after multiplying eq. (Ih) by — X,-. and eq. (17j) by — X.,- , we obtain 

(18) S{(m<, - n</)/n,y - X<. - X., •)«»»<,• = 0. 

Equating each brace to zero, as before, we find that 

(19) = n,-,(l + X<. + X,,) 

wherein X., is to be counted 0. The adjustment is now no longer proportionate 
by rows, but involves every cell. 

To evaluate the Lagrange multipliers in eq. (19) we may sum the two members 
downward and across in Fig. 1 and obtain the r + a — 1 normal equations 

n<. X<. + 2 WiyX.,' = mi. — n,-. , t = 1, 2, • • • , r 

2- n<yX<. + n.yX.y — m.j — n.f, j = 1, 2, • • • , a — 1. 

i 

These can be reduced for numerical computation. The top row solved for 
X,-. gives 

(21) Xi. = (1/n, •.){»»,•. - Z)n<,X.,} - 1 

whereupon by substitution into the bottom row of eqs. (20) we arrive at the 
a — 1 normal equations 



X.1 

X.2 • 

• • X.,-1 

= 1 


n.i- 

y' naTiii 



= m.i— 

y nam*. 


i Tli. 

i ni. 

i Ui. 





n. 2 — Zrf 


= m.2— 

y nam*. 



i ni. 

i rii. 


i ni. 

(22) 



t 

• 

• 


• 

• 

• 






_V' 



9 

< ni. 


% ni. 


0 . 


Because of symmetry in the coefficients, those below the diagonal are not shown, 
indeed, in a systematic computation, they are not used. The 0 in the bottom 
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row is appended for the computation of the minimized jS, if desired. The 
number of Lagrange multipliers to be solved for directly is s — 1, and the 
remaining ones come by substitution into eq. (21), X., being counted 0. 

A simple procedure for calculating the coefficients in the normal equations 
(22) is to set up a preparatory table by dividing each Ua in the rth row by VnT ; 
also to write down for that row, for use on the right-hand side of the 

normal equations (compare Tables I and II). In machine calculation the con- 
stant divisor \/n7 would be left on the keyboard until the entire ith row is 
divided; or, if reciprocal multiplication is preferred, the multiplier would 

be left on. From this preparatory table, the cumulation of squares and cross- 
products in the vertical gives the required summations for the coefficients. The 
sum check would be applied in the usual manner. 


3. A numerical example of the two dimensional Case 11. The fact is that 
in practice one need not bother about forming and solving the normal equations 
because they will be displaced by a simplifying iterative procedure, to be ex- 
plained in a later section. For illustration, however, we may do an example 
both ways, first using the normal equations and the adjustment (19), later on 
accomplishing the same results by the quicker method. 

We may start with the unitalicized numbers in the 4X6 array of Table I, 
assuming these to be the sampling frequencies n<y to be adjusted. Actually, 
they were obtained by deflating l/20th (for a supposed 5 per cent sample) the 
New England age X state table on p. 1108 of vol. 2 of the Fifteenth Census of 
the U. S,f 1930, then varying the deflated values by chance with Tippett V 
numbers to get our sampling frequencies n*, . The italicized entries in Table I 
represent the final (adjusted) mu , and it is these that we now set out to get. 
We start off with the sampling frequencies n,, and the known marginal totals 
fri.x , m.a, etc., where m<. = Ni.n/N, m.,* = N./n/N, as in eqs. (6) and (7)* 
The Lagrange multipliers shown along the left-hand and top borders arise in the 
calculations now to be undertaken. 

Table II is the preparatory table, advised at the close of the last section. It 
is derived from Table I by dividing the ith row of sample frequencies by • 
For example, the entry 8.64 in the cell i = 3, i =« 2 comes by dividing 419 by 
\/2362, 419 being the entry in the cell of the same indices in Table I, and 2362 
being the sum of the third row. The sums at the bottom and right-hand side 
are for checking the formation of the normal equations. The cumulations of 
squares and cross-products along the vertical give the summations required for 
the normal eqs. (22), which now appear numerically as eqs. (23). 


No. X.i X.J 
1 

(23) 2 

3 

4 


= 3197 X 10"^ 

» 2356 

» -3222 

0 


7413 -3549 - 2354 

4441 -544 

3129 
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Performing the solution by any favorite procedure one will obtain 
(24) X.1 -= .01182 X., - .01490 X., -,.00119 

TABLE I 

A table of artificial sample frequencies, an artificial 5 percent sample of native 
white persons of native white parentage attending school, by age by stale. New 
England, 19S0. The adjusted frequency mu in each cell is shoum italicized 
just below the corresponding sample frequency n<,- 


Age 




B 




i - 

1 

2 

3 

n 




Ki - 

.0118 

.0149 

.0012 

0 

mt. 

State 

i 

X.-. 




mu 


Maine 

1 


3623 

781 

657 


5274 




sets 

781 

560 


5858 

New Hampshire 


BESS 


395 

251 

u 

2371 




1688 

401 

851 


8395 

Vermont 



1653 

419 

264 

116 

1^^ 




1608 

4S5 

870 

119 


Massachusetts 

4 



2455 

1706 


15859 




10499 

8458 

1680 

1141 

15766 

Rhode Island 


- .0230 

1681 

353 

171 

154 

2359 




1668 

S50 

167 

150 

8330 

Connecticut 

6 

-.0034 

3882 

857 

544 

339 

5622 




S916 

867 

543 

338 

5668 



n.i 

22847 

5260 

3493 

2237 

33837 




28877 

5885 

3468 

8813 

33837 


The adjusted (italicized) are rounded off, hence when sununed may occasionally 
disagree a unit or so with the expected marginal totals (also italicized), the latter arise 
by deflation from the universe rather than by direct addition of the nuf . 


whereupon by substitution into eq. (21) comes 

Xi. = -.0146 X4. = -.0162 

(25) X*. = -.0003 Xs. = -.0230 

X,. = +.0234 X,. = -.0034. 

The next step is to compute the mu by eq. (19). Table I is now bordered 
with the Lagrange multipliers for a convenient arrangement of the factors 
required, and the calculation is completed. It will be noted that, for example 
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(26) ma = 419(1 + .0234 + .0149) =« 436. 

The mij thus calculated are shown italicized in Table I. The marginal totals, 
found by adding the w,/ just calculated, do not agree exactly everywhere with 
the expected totals, because of rounding off to integers: the errors of closure, 
however, are slight, and it is a simple matter to raise or lower some of the larger 
cells by a unit or two to force exact satisfaction of the conditions, if this is 
desired. 

4. The three dimensional problem. Here the N cards of the universe are 
sorted and counted for one and perhaps a second and third characteristic, and 
possibly crossed by pairs in various combinations (Cases I -VII). The sample 
of n, however, is crossed by all three characteristics, which is to say that the 

TABLE II 

This comes by dividing each sample frequency in Table I by the corresponding y/ . 
{This operation would ordinarily be done a row at a time) 



1 ^ * 


Sum 


1 

2 

3 

4 

i = 1 

49.89 

10.75 

7.67 

4.31 

72.32 

144.94 

2 

32.24 

8.11 

5.16 

3.18 

49.19 

97.87 

3 

32.02 

8.64 

5.44 

2.39 

50.15 

98.64 

4 

83.68 

19.49 

13.55 

9.21 

125.19 

251.12 

5 

34.61 

7.27 

3.52 

3.17 

47.97 

96.54 

6 

51.77 

11.43 

7.26 

4.52 

75.51 

150.49 

Sum 

284.21 

65.69 

42.59 

26.78 

420.33 

839.60 


cell frequencies n,-,* are all known (refer to J'ig. 2). As before, the adjusted 
frequencies are required. 

Case I: One set of slice totals known. Assume the slice totals Ni.. , JVa.. , 
• • • , JVr.. to be known; the conditions are then 

(27) 2 »»■•;* = = Ni-.n/N i = 1, 2, . . • r 

being r in number. The summation to be minimized is 

(28) S 2/(wii,*jk “ nijf^ jnijk 

being similar to that in eq. (3), except that now there are three indices to be 
summed over instead of two. Following a procedure similar to that used before, 
we differentiate eqs. (27) and (28) and introduce the r Lagrange multipliers X{. 
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with eq. (27). The steps are identical with those of the two dimensional Case I, 
and the result is at once 

(29) m<,* = nijkil + X<..) = ruikimi.Jni..). 

This adjustment, like that shown by eq. (15), is a simple proportionate one, but 
this time by slices rather than by columns. All cell frequencies having the same 
i index are raised or lowered in the same proportion. 



Fig. 2. Showing the Stbtbm of Notation for the Cell Frequencies and Marginal 
Totals in the Three Dimensional Sample 

Case II : I'wo sets of slice totals known. Here, in addition to the slice totals 
of Case I we know also 

,N... 

whence arise the « — 1 additional conditions 
(30) 52 »»</* = m.i. = N.i.n/N, j = 1, 2, 

ik 




8 — 1 . 
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Using the Lagrange multiplier X.,-. here, and X<.. with eq. (27) as before, we 
find that 

(31) mijk = n<jt(l + X,„ + X.y.) 

in which X.,. is to be counted zero. This adjustment is proportionate by tubes, 
the ratio being constant along the (/th tube and in fact equal to 

, independent of k. Unfortunately we do not here know the face totals 
m,-,-. and are unable to make use of the proportionality as we shall in Case IV. 
To solve for the r + s — 1 Lagrange multipliers we sum the members of eq. 

(31) over 7 and then over i and arrive at the normal equations 

2 naXi. = m.. - rn.. , t = 1, 2, • • • , r, 

(32) _ 

2 - «</. ~ > 7 = 1 , 2 , •••,« — 1 . 

i 

These can be reduced to s — 1 equations in precisely the same way that eqs. 
(20) were reduced, but because of the iterative process to come further on, we 
shall not pursue the reduction here. 

Case III: All three seta of slice totals knovm. All slice totals 


N .,. , N.i. N.,. 

Nu. Nr.. 

N..X , V..2 , . . . , N.., 

now being known, in addition to conditions (27) and (30) we require here 

(33) Z) mijk = m..k = N..kn/N, fc = 1, 2, • . • , < - 1 

a 

which makes a total of r + (s — 1) + (< — 1) or r + s + t — 2 conditions* 
The same kind of manipulation as used heretofore gives 


(34) 


mijk = nj,jfc(l + X<.. + X.y. + X..i) 


with X.,. and X..j to be counted zero. The adjustment is no longer propor- 
tionate by slices or tubes, but involves every cell. In practice, once the normal 
equations are solved and the Lagrange multipliers worked out, one proceeds 
very much as in the two dimensional Case II: for each of the t slices, corre- 
sponding to the t values of k, there will be a two dimensional adjustment, the 
1 in eq. (19) being replaced now by 1 + X,.* . 

The normal equations for the Lagrange multipliers can be found by per- 
forming double summations on eq. (34). The result is 


rii..X,*,, 4* riyy, X.y. ZJ rii,k\..k — mi,. rii.. j 
i k 

(35) Z) Wtf.Xi.. + n.j. X.y. + 23 n.jk\..k - m.j. - n.j . , 

i k 


i « 1, 2, . . . , r, 

j = 1,2, • • • , 5 1, 


£ Wi.jfcX,*.. + 2 ““ K.kf 

i i 


fc = 1, 2, * . * , t - 1. 
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If these calculations were to be carried out, one would sunplify the computation 
by solving the top row for X<.. , getting 

(36) X,.. = - 23n</.X./. - - 1 

/ * 

and then substituting this into the middle and last rows of eqs. (35) to get a 
reduced set of a + t — 2 normal equations for the Lagrange multipliers X.,-. 
and X..it , the numerical values of which when set back into eq. (36) give the X<.. . 
In all the summations of eqs. (35) and (36), X... and X..< would be counted zero. 
But here again, the iterative process to be explained later will displace the use 
of normal equations, so actually we are not interested in reducing them. 

Case IV : One set of face totals known. It may be that the rs face totals 

H 11. , H IS. , * * ‘ H ij. t * • * , N ft, 

are known from crossing the i and j characters in the universe. The conditions 
are then 


(37) £ mat, *» nnj. - Na.n/N 

k 

The adjustment here turns out to be 


i - 1, 2, 
i = i,2. 


8. 


(38) niijk — n<,*(l + Xj,-.); 

but by summing both sides over the index k to evaluate X,/. it is seen that 


(39) 


whence 


(40) 

muk = niikimijjnti.). 


This adjustment is thus proportionate by tubes, like that in eq. (31), though 
here the factor is known and eq. (40) can be applied at once. 

Case V : One set of face totals, and one set of slice totals known. Sometimes, in 
addition to the rs face totals of Case IV, the slice totals 


,N.., 


will also be known, in which circumstances the conditions (37) are to be accom- 
panied by 

(41) £ mtjk “ m..* =» N..kn/N, fc * 1, 2, • . • , f - 1. 

a 

The same procedure as previously applied yields now 

(42) muk * »</*(! + X</. + X..t) 

with X..C to be counted zero. Summations performed over k, and then over t 
and j together, give the normal equations 
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= niij. — n»/. , 

(43) _ ' 

2^ nijk\ij, + = wi..jb — n.,h. 

*;■ 

The number of equations is rs + ^ — 1, since X..* does not exist. As before, 
a simplification can be effected by solving the top row for X*,. and making a 
substitution into the lower one, but because of the great advantage of the 
iterative process to be seen further on, we shall not carry out the reduction. 

Before going on it might be noted that although this case is three dimensional, 
it reduces to the two dimensional Case II if one considers that ij. is one index 
running through the values 11, 12, • • • , 21, 22, • • • , rs, and that . . A; is a second 
index running through the values 1, 2, • • • , This can be seen by the simi- 
larity between eqs. (43) and (20). 

Case VI: Two sets of face totals known. If in addition to the face totals of 
Case IV, the face totals 

-V.ii , N,i2 , • * • , N.»t 

are also known from further crossing the j and k characters in the universe, we 
shall require 


(44) 


2 frtiik - m.ik = N.jkU/N, 


♦? “" 1> 2, • • ,5, 

A/ = l, 2, •• ,^“*1 


in addition to the conditions (37). In place of eq. (40) of Case IV we now 
find that 


(45) n%xjk — nijki^X “4* X,'/. “f* X.yjfc) 

in which X.ye is to be counted zero for all j. No simple relation such as eq. (40) 
is possible here, because the adjustment is not proportionate by tubes; the 
Lagrange multipliers must be evaluated. This can be accomplished by summing 
the members of eq. (45) over k and i in turn, resulting in the normal equations 

nij. \ij. ~i” 2^ ^ijk \jk ~ j 

k 

^ijk^ij. ~ ^.jk • 

Since \jt does not exist for any values of j, the number of equations is 
rs + s{t — 1) = s(r + ^ — 1). They break up at once into s sets each of 
r + ^ — 1 equations, one set for every j value. In fact, the problem can be 
considered as s sets of the two dimensional Case II. Any one value of j gives 
a slice, which can be looked upon as fulfilling the specifications of the two 
dimensional Case II. Each set of normal equations can be reduced in the same 
manner that eqs. (20) were reduced. 

Case VII: All three sets of face totals known. All totals now being known, 
we require 


i: 
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(37) 


- Nii.n/N, 

t » 1, 2, 

(44) 

k 

j “ 2, 

j “ 1, 2, 


2 miik « m./jk 

- N.iun/N, 


(47) 

i 


fc » 1, 2, 

* “ 1, 2, 

i 

= Ni.kn/N, 



k = 1,2, 




1 ^, 


• ••, 8 , 

• 00 ^ 8 , 

• • • , i — 1, 

• — 1 , 

• • • , ^ ~ 1 , 


The adjusting relation is 

(48) m*7Jb = ni/jk(l + X<;. + X.,* + X*-.*) 

in which X.ye is to be counted zero for any j, Xr.k for any k, and X».< for any i. 
The normal equations for the Lagrange multipliers are 

n<,.X<,. + X + ]C = mi. - n<,. 

M k 

(49) 2 riijtXij. + n.jiX.,* + 2 nv*X,.t = m.,* - n.,* 

i i 

2 + 23 + Tii.if\.k = mi.k — n<.» 

y >• 

being r» + r< + «< — r — « — < + 1 in number. They can be reduced in the 
same way that previous normal equations have been reduced; but here again, 
the iterative process will render the use of normal equations unnecessary, except 
for theoretical purposes, e.g. justification of the iterative process. 

6. A simplified procedure— iterative proportions. It is well known in least 
squares that the number of Lagrange multipliers in any problem is equal to the 
number of conditions imposed on the adjustment. Here the conditions have 
appeared in sets, depending on which marginal totals are involved. By a com- 
parison of eqs. (15) and (29) on the one hand, with cqs. (19), (31), (34), (42), 
(45), and (48) on the other, we see that wherever there was only ofie set of 
marginal totals involved we came out with a proportionate adjustment, but 
that in all other cases it was not so; the Lagrange multipliers involved were 
unfortunately related to one another through normal equations. We now make 
the observation, however, that as a first approximation the adjustments may 
all be considered proportionate, and we shall be able to ^nte down an expression 
for the error in this approximation, and shall be able to eliminate it by a suc- 
cession of proportionate adjustments. 

Take the two dimensional Case II for an example. In eq. (21) one may 

recognize (l/n<.) 23 as a weighted average of X.,- for the tth row. There 
i 

will be a weighted average of X.,- for the first row, another for the second, etc., 
one for each value of f; consequently one may appropriately speak of the tth 
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average of X.,- , writing it f-av. X.,- . Substituting from eq. (21) into (19) one 
then sees the adjustment (19) appear as 

(50) viij — nij{mi./ni. + X.,- — f-av. X.,). 

If, on the other hand, X.,- had been eliminated from eqs. (20), instead of X<. , 
the result would have been 

(51) mu = naim.i/n.j + X,-. — j-av. X,-.). 

From either eq. (50) or (51) it is clear why the adjustment (19) is not propor- 
tionate by rows or columns, and why Case II docs not break up into r or s sets 
of Case I: the reason is that X., in any cell is not necessarily equal to the average 
X.y for that row, nor is X<. in any cell neceasarily equal to the average X<. for 
that column. If nevertheless one were to make the simple proportionate 
adjustment 

(52) m'a = nuimjni) 

along the horizontal in the fth row, the horizontal conditions (4) will be en- 
forced but not the vertical ones (5); i.e., it will be foimd that m<. = m,. , but 
that usually not all m.# = m.,- . This is because eq. (52) effects only a partial 
adjustment, each m'ij being in error through the disparity between theX.,- proper 
to the jth column, and the average of all the X./ for the ith row, as seen in 
eq. (50). This error can then be diminished by turning the process around and 
subjecting these to a proportionate adjustment in the vertical according to 
the equation 

(53) m'i'f = m'ij{m.j/m'.j) 

which may be considered an application of eq. (51) wherein the disparity be- 
tween any X<. and the average X<. for the jth column has been neglected. It is 
the vertical conditions that will now be found satisfied, but perhaps not all of 
the horizontal ones, because some of the row totals may have been disturbed. 
The cycle initiated by eq. (52) is therefore repeated, and the process is con- 
tinued until the table reproduces itself and becomes ripd with the satisfaction 
of all the conditions, both horizontal and vertical. The final results coincide 
with the least squares solution, which is thus accomplished without the use of 
normal equations. 

Usually two cycles suffice. In practice the work proceeds rapidly, requiring 
only about one-seventh as much time as setting up the normal equations and 
solving them. The tables III-V show the various stages of the work when 
the method of iterative proportions is applied to the sample frequencies of 
Table I. It will be noticed that the results of the third approximation (Table V) 
are final, since if the process were continued, the table would only reproduce 
itself. 

The same process can be extended to three or more dimensions with an even 
greater relative saving in time. To see how the method of iterative proportions 
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applies in one of the three dimensional cases, we may go back to Case III. By 
the substitution afforded through eq. (86) the adjusting eq. (34) may be put 
into the form 


TABLE III 

The method of iterative proportions applied to the data of Table I. First stage: 
A proportionate adjustment by rows by eq. {52). Note that ml, = , 

but that m'.i ^ m.j 



i - 1 

2 

3 

4 


rm 

i = 1 


778 

555 

312 

6263 

6262 

2 

1686 

399 

264 

157 

2396 

2396 

3 

mSm 

433 

273 


2432 

2432 

4 


2441 

1696 

1153 

15766 

15766 

6 


349 

169 

162 

2330 

2330 

6 


863 

648 

341 

6662 

6662 

m'i 

22846 

6263 

3495 

2235 

33839 


m.f 

22877 

5285 

3462 

2213 


33837 


TABLE IV 

A continuation of the process initiated in Table III. The figures in Table III 
are now adjusted proportionately by columns according to eq. (53). The vertical 
totals m" and m.,- now are equal, but the agreement of the horizontal totals 
accomplished in Table III has been slightly disturbed 



j - 1 

2 

3 

4 

mi. 

nit. 

i = 1 

3613 

781 

550 

309 

6263 

5262 

2 

1588 

401 

252 

155 

2396 

2395 

3 

1608 

435 

270 

119 

2432 

2432 

4 

10490 

2451 

1680 

1142 

15763 

15766 

6 

1662 

350 

167 

151 

2330 

2330 

6 

3915 

867 

643 

338 

6663 

6662 

n 

m,i 

22876 

6286 

3462 

2214 

33837 


m,/ 

22877 

6286 

3462 

2213 


33837 


(54) = ntikimi.Jni.. + X.,-. + X..* - t-av. X.y. - i-av. X..*). 

Equally well it could have been written 

(66) muk *■ nijt{m.i./n.t. + X<.. + X..* - i-av. X,-.. - j-av. X..*), 


or 
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(66) = niik(m..k/n..k + X<.. + \.j. - h-av. X<.. - k-av. X./.). 

Any of these three equations shows why the adjustment (34) is not propor- 
tional by slices, and why this case does not break up into r or « or < sets of the 
three dimensional Case I. As a first approximation it does, as is now clear 
from these three equations, and by making successive proportionate adjust- 
ments we may thus arrive at the least squares values. To go about the work 
we could first calculate the values of 


(57) 

m'iik = n<;*(m<../n,-..) 

then 


(68) 

niijk - m'iiklm.iJmlf) 


TABLE V 

The cycle is commenced again. The figures of Table IV are subjected to a propor- 
tionate adjustment by rows, according to eg. {5S). And since these results turn 
out to be almost a reproduction of Table IV but with both horizontal and vertical 
conditions satisfied, they are considered final. The agreement with the in 
Table I should be noted 



i - 1 

2 

3 

4 

/ 

mi. 

mi. 

i = 1 

3612 

781 


309 

5252 

5252 

2 

1587 

401 

252 

155 

2395 

2395 

3 

1608 

435 


119 

2432 

2432 

4 


2451 

1680 

1142 

15765 

15766 

5 

1662 

350 

167 

151 

2330 

2330 

6 

3914 

867 

543 

338 

5662 

5662 

mu 

22875 

5285 

3462 

2214 

33836 


m,j 

22877 

5285 

3462 

2213 


33837 


followed by 

(59) m'ilk = miikim..k/m"j,). 

These three successive adjustments would constitute a cycle, which would then 
be repeated in whole or in part until the table becomes rigid with the satis- 
faction of ail three sets of conditions. 

6, Simplification when only one cell requires adjustment. On occasions it 
happens in sampling work that one is especially interested in one particular cell 
of the universe, and would like to have a result for it in advance before the other 
cells are adjusted. Sometimes it even happens that the others individuaJly 
are of no particular concern. In such circumstances one merely places the cell 



A LEAST 8QXTAHBB AJNnCTSTMENT 


443 


of interest in one comer of the table by an appropriate interchange of rows and 
columns, and then compresses the rest of the table into the cells adjacent to it. 
In the two dimensional Case II one would thus work with a 2 X 2 table, one 
comer cell being the one of special interest, the other three being the result of 
compression. The marginal totals of the row and column belonging to the cell 
of interest are unaffected. For illustration we may suppose that from the 
sample shown in Table I we require only tnn . We then start with the 2X2 
Table VI, which is derived from Table I by compression. Commencing with 
Table VI, one might first adjust by rows according to eq. (52), then by columns 
by eq. (63). One cycle of iterative proportions is suflBcient, as is seen in Table 

TABLE VI 


Derived from Table I by compression, the cell i = 6, j = 1, requiring adjustmerd 




y - 2 - 4 


mi. 

1 = 1-5 

18965 

9250 

28215 

28176 

i = 6 

3882 

1740 

5622 

5662 

n.j 

22847 

10990 

33837 


m,j 

22877 

10960 


33837 


TABLE VII 

A proportionate adjustment of Table VI 
Rows adjusted by eq. (52) Columns adjusted by eq. (63) 


18938 

9237 

28175 

18962 

9213 

28175 

3910 

1752 

5662 

3916 

1747 

5662 

22848 

10989 

33837 

22877 

10960 

33837 


Conclusion: mti = 3916 


VII, and the value 3915 found for is in good agreement with its value shown 
in Tables I and V. The scheme of compression provides a quick method of 
getting out an advance adjustment for a cell of special interest, and the result 
so obtained will ordinarily be in good agreement with what comes later when 
and if all the cells are adjusted. 

In the three dimensional Cases II, III, V, VI, and VII, one compresses the 
original table to a 2 X 2 X 2 table, and then uses the method of iterative propor- 
tions. (The other cases do not require consideration, since they are propor- 
tionate adjustments wherein one is already at liberty to adjust as few or as 
many cells as he likes without altering the equations or the routine.) The same 
procedure can be extended to the adjustment of two cells, the only modification 
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being that in two dimensions we shall compress toa2X3ora3X3 table, 
depending on whether the two cells do or do not lie in the same row or column. 
In three dimensions we compress toa2X2X3, ora2X3X3, ora3 X 3 X3 
table; the first if the two cells lie in the same i, j, or k tube, the second if they 
lie in the same slice but not in the same tube, the third if they are in separate 
slices. 

7. Some remarks on the accuracy of an adjustment. A least squares adjust- 
ment of sampling results must be regarded as a systematic procedure for 
obtaining satisfaction of the conditions imposed, and at the same time effecting 
an improvement of the data in the sense of obtaining results of smaller variance 
than the sample itself, under ideal conditions of sampling from a stable imiverse. 
It must not be supposed that any or all of the adjusted m,-,- in any table are 
necessarily “closer to the tnith” than the corresponding sampling frequencies 
tiij , even under ideal conditions. As for the standard errors of the adjusted 
results, they can easily be estimated for the ideal case by making use of the 
calculated chi-square. For predictive purposes, however (which can be regarded 
as the only possible use of a census by any method, sample or complete), it is 
far preferable, in fact necessary, to get some idea of the errors of sampling by 
actual trial, such as by a comparison of the sampling results with the universe, 
as can often be arranged by means of controls. There is another aspect to the 
problem of error— even a 100 per cent count, even though strictly accurate, is 
not by itself useful for prediction, except so far as we can assert on other grounds 
what secular changes are taking place. 

In conclusion it is a pleasure to record our appreciation of the assistance of 
Miss Irma D. Friedman and Mr. Wilson H. Grabill for putting the formulas 
and procedure into actual operation with census data, and thereby disclosing 
defects in earlier drafts of the manuscript. 

Bukbatt of the Census, 

Washinoton 



NOTES 

This section is devoted to brief research and expository articles^ notes on methodology 
and other short items. 


THE STANDARD ERRORS OF THE GEOMETRIC AND HARMONIC 
MEANS AND THEIR APPUCATION TO INDEX NUMBERS' 

By Nilan Norris 

Attempts to derive useful expressions for estimating the standard deviations 
of the sampling errors of the geometric and harmonic means have not yielded 
results comparable with those afforded by the modem theory of estimation, 
including fiducial inference. There are in the literature of probability theory 
certain theorems which can be applied to obtain these desired results in a 
straightforward manner. The use of forms for estimating standard errors is 
subject to certain conditions which are not always fulfilled, particularly in the 
case of time series. An understanding of these limitations should deter those 
who may be tempted to judge the significance of phenomena such as price 
changes solely on the basis of estimated standard errors of indexes. 

1. Statement of formulas. The standard error of the geometric mean of a 
sequence of positive independent chance variables denoted by Xi = Xi , X 2 , • • * , 

, is AT© = , where 6i is the population geometric mean of the variates; 

y/n 

so that (Tiog s is the standard deviation of the logarithms in the population as 
given by (nog * = [-^{[log x — J5(log x)f )]*; and n is the number of individuals 
comprising the sample. The estimate of the standard error of the geometric 

mean is «<? = G — , where G is the sample geometric mean, that is, the 
Vn — 1 

estimate of Bi ; so that siog is the estimate of <r\og * ; and n — 1 is the degree of 
freedom of the sample. 

^ This article summarizes two papers presented at sessions of the Institute of Mathe- 
matical Statistics at Detroit, Michigan on December 27, 1938, and at Philadelphia, Penn- 
sylvania on December 27, 1939. The results given herein can be derived by several meth- 
ods, which vary somewhat as to degree of rigor. The writer wishes to acknowledge his 
indebtedness to the referee for suggesting a proof based on a probability theorem stated 
by J. L. Doob, ^The limiting distributions of certain statistics,’’ Annals of Math. Stat.f 
Vol. 4 (1935), pp. 160-169. The standard deviation formulas obtained follow as an applica- 
tion of this theorem, as will be seen by reference to it. Obviously the asymptotic variance 
formulas of many other statistics (estimates of parameters) can be obtained in a similar 
manner. 
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The standard error of the harmonic mean of a sequence of positive inde- 
pendent chance variables denoted by a:< = *1 ,**,•••, Xn , is o-* = , 

y/n 

where the population harmonic mean of the variates is = 1/a = [^(1/x)]”*; 
so that the standard deviation of l/x in the population is ffi/» = [jB{[1/x — 
J5(l/x)]*}]*; and n is the number of observations comprising the sample. The 

estimate of the standard error of the harmonic mean is «» = 4 , where 

o vn — 1 

the estimate of a is given by a = 4 = - (2 l/x,); in which « i/»< is the standard 

H u 

deviation of the reciprocals of the observations comprising the sample; and 
n — 1 is the degree of freedom of the sample. 


2. Derivation of formulas. These forms can be obtained by application of 
the Laplace-Liapounoff theorem^ as follows: Let x,- = , J 2 , * • • , be a set of 

positive independent chance variables with the same distribution functions, 
where the expectations, E(xi) and E{x]) exist, and where al = E{[xi — E(xi)Y\ 
> 0. The last condition is imposed to eliminate the trivial case in which the x*- 
are all equal and their distribution is confined to a single point. The geometric 
mean of the Xi is (? = (xi X 2 and the harmonic mean of the x,- is 



It is necessary to assume that both <riog » and a\/x are finite, and that in the 
case of both log x and l/x at least one moment of order higher than any two of the 
respective variates is also finite. The requirement that the variance and at 
least one moment higher than the variance be finite can be weakened in various 
ways, but this is a trivial consideration, since nearly all distributions of any 
importance have finite third moments.^ Certain rarely occurring types of 
distributions, such as the Cauchy distribution, have infinite variance. In such 
cases, standard error formulas as ordinarily used are not valid. 

Let E{log x) = f, and E(l/x) = a. By the Laplace-LiapounofiF theorem. 


except for terms of order l/\/n, the limiting distributions of 


y/n{\og G - i) 


and 


y/n{H~^ - a) 


Vlog« 


are normal with zero arithmetic means and unit variances. 


Vl/» 

That is, if C represents a set of conditions on chance variables, and P{C} is the 
probability that these conditions are satisfied, then 


* A. Khintchine, Asymptotische Gesetze der Wahrscheinlichkeitsrechnung, ErgehnUte 
der Mathematik und ihrer Grenzgehiete^ J. Springer, Berlin, 1933, Vol. II, No. 4, pp. 1-8; 
J. L. Doob, op. ct7., pp. 160-169; and S. S. Wilks, Statistical Inference, 193^1937, Edwards 
Brothers, Inc., Ann Arbor, 1937, pp. 39/. 

* For a more detailed discussion of this matter see Wi^ks, op. ciL, pp. 39/. 
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n^m \ <riocff ) »-♦•» I 

In order to use these relations in obtaining the limiting distributions of the 
geometric and harmonic means, it is necessary to suppose that the sequence of 
random chance variables, Vi , convei^es in probability (converges stochasti- 
cally) to p, and that the sequence of random chance variables, \/n(F< — p), has 
a normal limiting distribution with zero arithmetic mean and variance v . 
Also, it is necessary to assume that the real-valued function, /(x), has a Taylor 
expansion valid in the neighborhood of p. If /'(p) ^ 0, only the first two terms 
of the series are needed. The required expansion is given by 

/(*) = /(p) + (X - p)/'(p) + - p)], 

where 0 < /3 < 1, and/"(x) is continuous in the neighborhood of p. When these 
conditions are fulfilled, the limiting distribution of \/n[f{V{) — f{p)] is normal 
with an arithmetic mean of zero and a variance of <r*[f'{p)f. 

Let /(log G) = and use the expansion given by e*®* " = -|- (log G — f)e^ 

+ i(log G — f)* Since 0i — it follows that the limiting distribu- 

tion of -s/niG — di) is normal with an arithmetic mean of zero and a variance of 

lof X • 

Similarly, it can be shown that the limiting distribution of y/n{H — Jj) is 
normal with an arithmetic mean of zero and a variance of 0to\ix , where 0% — 

i « [E(l/x)]-\ 
a 

It is of some interest to observe that the expressions for the standard errors 
of the geometric and harmonic means correspond with the forms previously 
given for the standard errors of two efiicient ratio-n^easures of relative variation,* 
namely, 

9i , el 

Vau — ^ ana Vbiq = Vata > 

where e\/e is the population geometric-arithmetic ratio, and is the popula- 
tion harmonic-geometric ratio. 

3. Limitations of standard-error estimates. Application of these forms is 
subject to the usual conditions for drawing sound inferences on the basis of the 
representative method. Fiducial argument should be employed to avoid certain 
untenable assumptions of the outmoded method of using standard errors. 
Estimates of the standard deviations of sampling errors do not constitute an 
ultimate test of significance which can be applied with a high degree of success 
to all types of problems. In general, such estimates cannot be relied upon with a 

* Nilan Norris, "Some efficient measures of relative dispersion," Annals of itfUh. Stat., 
Vol. 9 (1938), pp. 214-220. 
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high degree of confidence when they are used as tests of significance for index 
numbers, since in nearly all time series there exists an appreciable degree of 
serial correlation, persistence, or lack of independence among successive items of 
any sample. 

4. Bibliographical note. Certain aspects of the sampling distribution of the 
geometric mean have been discussed by Burton H. Camp.® Attempts to derive 
forms for estimating the standard errors of index numbers have been made by 
Truman L. Kelley® and Irving Fisher,^ and an empirical study of the sampling 
fluctuations of indexes has been made by E. C. Rhodes.® Although various 
special tests of significance for time series have been proposed,® at the present 
time no generally satisfactory procedure has appeared. 

Hunter College, 

New York, N. Y, 

* Burton H. Camp, ‘‘Notes on the distribution of the geometric mean/' Annals of Math, 
Slat., Vol. 9 (1938), pp. 221-226. 

• Truman L. Kelley, “Certain Properties of Index Numbers," Quarterly Publications of 
Am, Stat. Assn.f Vol. 17, New Series 1^, Sept., 1921, pp. 826-841. 

’ Irving Fisher, The Making of Index NumherSf Houghton Mifflin Company, New York, 
1927, 3d ed., pp. 21^-229, 342-345, and Appendix I, pp. 407 and 430 /. 

* E. C. Rhodes, “The precision of index numbers," Roy. Stat. Soc. Jour.y Vol. 99 (1936), 
Part I, pp. 142-146, and Part II, pp. 367-369. 

• Some of the more recent papers dealing with this matter are: G. Tintncr, “On tests of 
significance in time series," Anna/s of Math, Stat,, Vol. 10 (1939), pp. 139-143; “The analysis 
of economic time series," Am. Stat. Assn. Jour.^ Vol. 35 (1940), pp. 93-100; L. R. Hafstad, 
“On the Bartels technique for time-series analysis, and its relation to the analysis of 
variance," Am. Stat, Assn. Jour,^ Vol. 35 (1940), pp. 347-361; and Lila F. Knudsen, “Inter- 
dependence in a series," Am, Stat. Assn. Jour., Vol. 35 (1940), pp. 507-514. 


A NOTE ON THE USE OF A PEARSON TYPE HI FUNCTION IN 

RENEWAL THEORY 

By a. W. Brown 

One of the methods suggested by A. J. Lotka' for the derivation of the renewal 
function may be briefly summarized as follows. 

The method consists of dissecting the total renewal function into “genera- 
tions^\ The original installation constitutes the zero generation, the units 
introduced to replace disused units of the zero generation constitute the first 
generation, renewal of these the second, and so on. Let/(a;) be the ^‘mortality'^ 
function, the same for all generations. f{x) is a function satisfying the usual 
conditions of a distribution function. Adopting Lotka^s notation, let N be the 
number of units in the original collection, Bi{t) dt the number of objects intro- 

^ A. J. Lotka, “A Contribution to the Theory of Self Renewing Aggregates, With Special 
Reference to Industrial Replacement," Anruils of Math. Stat.^ Vol. 10 (1939), p. 1. 



BBNKWAL THBOBY 


449 


duced between times t and t + dt and belonging to the first generation, Bt{() di 
a similar expression for the second generation, etc. Bi{t)/N, Bi(t)/N, . . . may 
be regarded as renewal density fimctions for the various generations. 

Now, evidently, 

(1) By(t) = Nm 

(2) Biit) = Bi(t-x)f(.x)dx 

and in general 

(3) B^iit) = f Bi{t-x)fix)dx. 

Summation of the contributions of the successive generations gives for the total 
renewal at the time t 

(4) B(l) = Blit) + j[‘ Bit - x)fix) dx. 


In this note we propose to use a Pearson Type III fimction for fix) and observe 
what form our equations then assume. The Pearson Type III function 



practical situations. The two parameters c and k give it a considerable amount 
of flexibility. The fact that this function has an unlimited range in one direc- 
tion is relatively unimportant from a practical point of view, as is well known 
•from the experience of fitting curves of this type to skewed data with limited 
range. Of course the question of whether a Type III curve is appropriate can 
be answered more objectively by using the usual Pearson curve-fitting criteria, 
j9i , and k. We have, then, substituting in (1) 


(5) 


Blit) 




and from (2) 
( 0 ) 

(7) 


r(*) 




Wc“ 


f it - *)*"***“‘dx. 
Jq 


mm 

If, now, we set * = ty, the integral in (7) reduces to 

,tt-i mm 

r(2JS:) ■ 
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Hence, 

( 8 ) 

and in general 
(9) 


Jh 






Summing the contributions of the several generations, we have for the total 
renewal function 


( 10 ) 


Bit) = Nee 


-ct jict) 


k-l 


ict) 




\ r(jfc) r(2jfc) 



If A; is a positive integer > 3, (10) can be easily summed to a form which 
shows immediately its damped periodic nature. Kven if k is positive but not 
an integer, it can be shown by continuity considerations that the function Bit) 
defined by (10) has periodic properties. 

Assuming A: to be a positive integer, then, and setting z = ct, we may write 
the expression in brackets in (10) as 


( 11 ) 


ji-i 


Ji-l 


+ 


ik- 1)! ' (2*- 1)! 


+ 


■fiz)- 


Then 




^Siz) 


and upon making the trial substitution, /(z) = Ae”“, we get 

Am*e“' = Ae”". 


Hence, 


m 


= 1 . 


Taking unity in its complex form 

1 = cos 2nir + i sin 2nT 

we have that 

*/T 2nir , . . 2nir 
(12) mn = vl = cos — h t sm 

where n = 0, 1, 2, • • • , fc — 1. Then 


/(a) =LA.e”-* 

n*0 


ife~l 


/(«) = 


and 
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Now setting 2 » 0, we get 

fiO) ^ Ao + Ai+ ... + Ak-i = 0 
/'(O) = AtmU) + Ai/ni + • • • + Ak-imit-i * 0 


y* *(0) = Aotno * + AiMi *+•••+ Ak-iir^CX = 1 

k equations to determine the k constants. We know that An is equal to the 
ratio of two determinants formed from the coefficients of the above equations. 
This ratio reduces to 

(13) An = . N 7 

(m*_i - - mn) . . . {nin - mo) 

We have, then, an expression for the k constants in terms of the A; roots of unity. 
Therefore, for any particular value of k we can obtain the sum of our series 
from the relation 

nz)^ZAne^. 

n-O 


Hence, under the assumption that A: is a positive integer, we have 
(14) Bit) = Nce-‘* E 4-e"***. 


The forms of B(t) for A: = 1, 2, 3, 4 are req)eotiveiy 
Bit) = Nc 

Bit) = iJVcd - c-*“) 


Bit) = 




cos i\^^t 


+ 



Bit) = Wcc“*‘[i(e“ - 6"“) - § sin cf). 


Although the above procedure is valuable particularly because it brings to 
light something of the nature of our renewal function, the forms derived above 
can be used actually to obtain values of Bit) for various values of t. However, 
for extensive numerical work a better method is at hand, which does not even 
depend on the assumption of an integral value for k. 

Let us return once again to equation (10) which may be written in the fol- 
lowing form 


Bit) 


Nc 


e"“(ct)*“‘ 

. r(A:) 


e-‘(ct)*-‘ 

r(2A!) 



( 16 ) 
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If k and c are determined by the method of moments, (using two moments), 
k will not, in general, be a positive integer. However, by using the Tables of 
the Incomplete Gamma Function edited by Karl Pearson, one can compute values 
of B{t) without much difficulty. In these tables the function /(u, p) is tabulated 
for various values of u and p, where I{u, p) is defined by 


(16) 


/(u, p) = 


/•ttv'p+i 

/ e-^v^dv 
h 

r(p + 1) 


If we let { = Uiy/p 4 - 1 = Wov/p then upon integrating by parts we find 


(17) 


e-^if 

r(p + 1) 


= I(uo,p- 1) -/(ui,p). 


The left hand member of this equation is of the same form as each of the terms 
of the series in brackets in (15). Hence, the value of the renewal function for a 
particular time, t, is directly obtainable by summation of 'the right hand member 
of (17) for .successive significant values of the argument p. 

By way of illustration a numerical example will be considered. The data are 
taken from E. B. Kurtz’ book entitled Life Expectancy of Physical Property. 
In this book the author makes a study of retirement rates of fifty-two different 
types of physical property, and finds that their replacement curves fall into seven 
distinct groups. We consider here Group VII which happens to be the largest 
group, embracing seventeen different types of industrial equipment out of the 
fifty-two examined. Using Kurtz’ replacement data * we obtain for the value 
of the first and second moments 


Ml = 10.002 

M* = 121.71 

and from these by the method of moments, we find 

k = 4.62 
c = .462. 

We then proceed to calculate values of B{t)/N by means of Pearson’s Tables,* ob- 
taining the results shown in the following table. 

* E. B. Kurtz, Life Expectancy of Physical Properly, Ronald Press, 1030, Table 22, page 86. 

* With regard to the method of interpolation employed in the calculations, it should 
be mentioned that it was found advisable to use the Mid-panel Central Difference Formula 
(xxiii) on page xii of the introduction to Pearson’s Tables; and that it is quite sufficient 
for our purposes to calculate only first order terms. 
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i 

B{t)/N 

t 

BitVF 

0 

.0000 

10 

.1049 

1 

.0016 

11 

.1043 

2 

.0103 

12 

.1028 

3 

.0279 

13 

.1006 

4 

.0486 

14 

.0990 

5 

.0714 

15 

.0994 

6 

.0867 

16 

.1009 

7 

.0980 

17 

.1013 

8 

.1039 

18 

.0992 

9 

.1066 

19 

.0999 



20 

.0993 


In conclusion the author wishes to thank Professor S. S. Wilks for various 
suggestions he has made in connection with this note. 

Pbincbton University, 

Princeton, N. J. 


ESTIMATES OF PARAMETERS BY MEANS OF LEAST SQUARES 
By Evan Johnson, Jb. 

As a criterion for comparing estimates of a parameter of a universe, of known 
type of distribution, the use of the principle of least squares is suggested. A 
criterion may be stated in rather general terms. Its application to any given 
problem presumes a knowledge of the distribution functions of the estimates 
considered. In the present paper a criterion is set' up and application of it is 
made in the estimation of the mean and of the square of standard deviation of a 
normal universe. 

We shall use the symbol 6 to represent a parameter to be estimated. It is 
to be remembered that is a constant throughout any problem, that it represents 
an unknown value, and that observations and functions of observations (called 
estimates) are the only variables that occur. We shall use the symbols , i = 
1, 2, • ■ • , n, to represent observed values of the variable x of the universe, and 
the S 3 uubol F to represent a given function of the observations x,- . 

If we choose to consider a given function F as an estimate of $, we are then 
interested in the error F — 0. This quantity differs from the so-called residual 
of least square theory, since we are here interested in the difference between 
computed and true values, rather than in the difference between observed and 
computed values. To avoid any possible confusion we shall refer to F — 0 
as the error. Over the set of all samples of n observations, x,- , the distribution 
of the errors F — S is expressed by means of the distribution function /(F), 
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which may be computed from the known distribution function of the universe. 
We shall assume that the function /(F) has been normalized, so that J f(F) dF = 
1, where the interval from ato^ includes all possible values of F. The integral 

r0 

I - J (F - 6ff(F) dF, associated with a given estimate F, may be thought 

of as the average square error over the set of all samples. 

In this notation we shall state a criterion for the judgment of estimates in 
either of the two following forms: 

Definition 1. Lei fi be the distribution function of Fi , and fi that of F % . 
The estimate Fi of B will be judged better than the estimate F 2 if 

[ (x — O^fiix) dx < f (x — Oyfiix) dx, 

^ a •'a 


Definition 2. From a given doss of functions, of which F is a member, F wiU 
he called the best estimate if 


( 1 ) 


j (F - eym 


dF 


is less than the corresponding integral for all other functions of the class. 

It is to be observed that the integral I is a function of the quantities 0 and /. 
From this is seen at once the distinction between the present problem of mini- 
mizing the average square error and the similar problem of finding that point 
around which the mean square value of the deviations of a variable is a minimum. 
In the problem under consideration we wish to find the function F, or more 
precisely its distribution function f{F), for which I takes its minimum with a 
fixed value of 6. In the alternative problem we have a given distribution f 
and we wish to find the minimum of I with respect to 6. 

A second observation to be made is that the integral I can not be usefully 
minimized in the sense of the general conditions of the calculus of variations. 
The problem would be of the isoperimetric variety, with the side condition 

I f(z) dx = I, A solution might be expressed as the limit, as a approaches zero, 

•'o 

of functions f(x) with proper continuity conditions, such that 


fix) 


f =» 0 when I ® 0 1 S o, 

> 0 when [ x — S | < o, and / fix) dx = 1. 


Such a solution would be meaningless in practical statistical theory. Solutions 
are to be expected, therefore, only in those cases where the class of functions, 
from which F is to be selected, is sufficiently restricted. 

The two following examples illustrate both restrictions and possible applica- 
tion of the theory. 
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As a first example let us consider the problem of finding an estimate F of the 
mean, 2, of a normal universe. The mean of a distribution is a symmetric 
linear function of the variates of the distribution. For the class of functions 
from which to select an estimate F of 2, let us take the class of all symmetric 
homogeneous linear functions of the observations Xi . Let 

(2) F — a(xi + aj* + • . • + ®n). 

We wish to find the value of a, if any, for which / is a minimum. 

F is the sum of n normally distributed independent variables, o*< , each with 
standard deviation atr. F, therefore, has a distribution function 

where C is so chosen that /dF = 1. A discussion of general distribution func- 
tions may be found in Dunham Jackson^s article, ‘Theory of Small Samples,” 
in the American Mathematical Monthly, Volume XLII, 1935. In this case it 
can be shown without particular difficulty that 

*= + i^(aTi — 1)*. 


To determine the minimum of I with respect to a, we set 
^ = 2anff + 22*(on - l)n = 0, 


e _ 1 1 

n2’ + n 1 + 

•••)• 

It is seen that for even such a simple example as the estimation of the mean 
there is no estimate of the form of equation (2), with a independent of the param- 
eter to be estimated, for which I takes its minimum value. 

For a distribution in which 2 0, and is small, a is given as a first 

approximation by 1 /n. The function F is merely the mean of the sample obser- 
vations. If 2 = 0, the required solution is a = 0, and there is no best least 
square estimate of the type of equation (2). 

In the case where is not small, as is apt to be the case when 2 is near 
zero, the determination of a desirable estimate by least squares requires a knowl- 
edge of the ratio which may perhaps be judged approximately in a special 


and obtain 


(3) 
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problem. If this value is assumed known, the required value of a may be found 
most easily by rewriting equation (3) in the form 


(4) 


1 

n + * 


The second example to be considered is the determination of an estimate of 
of a normal universe. A comparison with the definition of cr* suggests the 
use of a function F given by the equation 

( 6 ) F^a[ (X, ^ + (xn - xY }, 


where x is the mean of the n observations. The value of a is, of course, to be 
determined by minimizing the integral 7. 

F is the sum of the squares of n normally distributed but not independent 
variables. It may be shown, however, (Jackson, loc. dL) to be expressible as 
the sum of the squares of n— 1 independent normally distributed variables, each 
with standard deviation \/a<r. The distribution function for F takes the form 


( 6 ) 


f(F) = C 


F taking only positive values, and C is again chosen to normalize /(F). The 
integral I may be written 

{F - dF. 

The integration ia most easily accomplished by replacing F by m*, and in terms 
of u 

I = C' r (w* - du. 




The various steps in the integration will differ for even and odd values of n, 
but in each case the final result is the same. It is found that 

(7) / = «rMoV-l)-2o(n-l) + l }. 

The value of a which minimizes 7 is determined from the relation 


g = .r‘ {2o(n* - 1) - 2(n - 1)} = 0. 


Dividing by (n— 1), which is not zero in a sample of two or more observations, 
we obtain 


( 8 ) 


1 

^ • 

n + 1 


In contrast to the previous example we have here an absolute minimum of I 
with respect to all estimates of the type of equation (5). The best least square 
estimate of this type is, therefore, 

(9) F = fa ~ ^)* + fa • + (»n -- i)* 

n + 1 

PSNNSTLTAMIA STATE CoUEGB, 

State College, Pa. 



THE TEACHING OF STATISTICS^ 

By Harold Hotelling 

The very great increase in the teaching of statistics since the First World 
War has been associated on one hand with the development of statistical theory. 
This important series of discoveries has made available more and more power- 
ful and accurate statistical methods, and has also acquired an intellectual 
interest of its own as embodying the modem version of the most important 
part of inductive logic and as providing scope for mathematical and logical 
ingenuity of high order. The increased teaching of statistics has also been 
associated with the rapidly growing applications of statistics in innumerable 
fields, made possible by the development of the theory, by the availability of 
persons having some knowledge of the theory, and by an increasing realization 
of the possibilities of application. Doubtless most students of statistics enter 
upon the subject, not for its intrinsic interest, but with the idea of applying 
statistical methods as a tool to some particular end. This object may be 
scientific research, or to fulfill a requirement for a degree, but is often connected 
with some purely practical pursuit offering the ready prospect of a remunerative 
job. But it would be a mistake to ignore those whose interest is more purely 
intellectual, who desire an insight into the peculiar problems of probable in- 
ference and the structure of empirical knowledge, who wish to get a fundamental 
acquaintance with one of the most fundamental of subjects, to see and under- 
stand fully the mathematical derivations underlying so much practical and 
scientific activity, and perhaps to make their own cqntributions. 

Of the magnitude of the demand for statisticians there can be no doubt. 
The realization of what statistical methods can do in a multitude of fields has 
gradually led the administrators of government agencies, directors of scientific 
organizations and research institutes, and business men, to employ rapidly 
increasing numbers of persons with some knowledge of statistical methods, and 
to accord an unusual degree of recognition and promotion in many such cases. 
The uses of statistical methods, and especially of sampling theory, are so varied 
that it is scarcely possible in a brief space to give any sort of survey of them. 
They enter, in one form or another, into the research work of the physicist, the 
chemist, the astronomer, the biologist, the psychologist, the anthropologist, 
the medical investigator, the economist, and the sociologist. Meteorology, 
which has lately acquired greatly increased importance, both civil and military, 
is with its masses of numerical observations very much a statistical matter. 
The engineer needs modem statistical methods both in the physical and in the 

^ Address at the meeting of the Institute of Mathematical Statistics at Hanover, N. H., 
September 10, 1940. 
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economic aspects of his plans. The work of W. A. Shewhart has made clear 
the central importance of sampling theory in the economic control of quality 
of manufactured articles. Business men who use sampling surveys to test 
the markets for their products and the effectiveness of their advertising, who 
employ statisticians to make up index numbers and forecasts of business condi- 
tions, and whose manufacturing costs and quality are controlled with the 
help of recently devised statistical methods, are finding more and more uses for 
statisticians. Indeed, it seems as if the exploitation of the business and manu- 
facturing possibilities of statistical methods has only begun, and that limitless 
further fields are coming into view. Insurance has of course always been essen- 
tially dependent on statistics. 

But the most rapidly growing large class of positions for statisticians is at 
present in governmental activities. For some facts regarding the employment 
of statisticians by the federal government I am indebted to Dr. J. M. Thomp- 
son. It appears that it has about one hundred agencies using statistics, with 
almost eight hundred positions broadly classified as statistical or mathematical, 
in addition to more than six thousand generally classified as economists. The 
title ‘‘economist^’ covers many types of work, but much of it is largely statis- 
tical. The nature of the government’s statistical work Is varied and extensive. 
It includes such work as forecasting revenue from taxes, prices and production 
of agricultural commodities, general demand conditions, and weather. Some 
of the work consists in analyzing the effects of various taxes on other programs. 
In connection with proposed legislation, statisticians serving the lawmakers 
often attempt to outline the probable results of the legislation, as well as to 
assist in setting up definite formulae for carrying out the general policies aimed 
at in Acts of Congress. Administrators as well as lawmakers require statistical 
activities of a high order, exemplified in the Bureau of the Census, the Bureau 
of Agricultural Economics, and others. The scientific activities of the govern- 
ment, the work of the War Department, and many others that do not at first 
sight appear at all statistical, require the services of mathematical statisticians 
of high order. Even the judicial activities call for statistical theory of some 
of the most recently discovered kinds, as for instance in the investigation re- 
cently made of parole procedures. Cities and states, school and port authori- 
ties, employ numerous statisticians for other and widely diverse purposes. 

The growing need, demand and opportunity have confronted the educational 
system of the country with a series of problems regarding the teaching of statis- 
tics. Should statistics be taught in the department of agriculture, anthro- 
pology, astronomy, biology, business, economics, education, engineering, 
medicine, physics, political science, psychology, or sociology, or in all these 
departments? Should its teaching be entrusted to the department of mathe- 
matics, or to a separate department of statistics, and in either of these cases 
should other departments be prohibited from offering duplicating courses in 
statistics, as they are often inclined to do? To what students, and at what 
stage of their advancement, should a course in statistics be administered? 
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Should there be mathematical or other prerequisites? How much of an in- 
vestment in a statistical laboratory is warranted? Should courses be primarily 
theoretical and mathematical, or should they be made as practical as possible, 
equipping the student in the shortest possible time for a job as statistician, or 
for statistical work in the field with which a particular department is con- 
cerned? What about degrees in statistics? Eclipsing all these in importance, 
though it seems to have received too little of the attention of college and uni- 
versity administrative officers is the question. What sort of persons should be 
appointed to teach statistics? 

To pressing practical problems answers are sure to be given either by con- 
sidered policy or by processes of historical evolution. The latter are the more 
prominent in explaining the statistical teaching we have had. A synoptic 
picture of the origins, not many decades ago, of a good deal of it would perhaps 
be something like this. A university Department of X, where X stands for 
economics, psychology, or any one of numerous other fields, begins to note 
toward the end of the pre-statistical era that some of the outstanding work 
in its field involves statistics. The quantity and importance of such work are 
observed to increase, while at the same time its intelligibility seems to diminish. 
Evidently students turned out with degrees in the field of X who do not know 
something about statistics are going to be handicapped, and are not likely to 
reflect credit on Alma Mater. The department therefore resolves that its 
students must acquire at least an elementary knowledge of the fundamentals 
of statistics. To implement this principle, it perhaps inserts some acquaint- 
ance with statistics among the requirements for a degree. This situation 
naturally calls for the introduction of a course in statistics. Accordingly the 
head of the Department of X, in preparing the next Announcement of Courses, 
writes: 

“X 82. Elements of Statistics. An elementary but thorough 
course designed to acquaint students of X with the fundamental con- 
cepts of statistics and their applications in the field of X. The view- 
point will be practical throughout. Second semester, MWF at 10. 

“Instructor to be announced.” 

The problem now arises of finding someone to teach the new course. The 
few well-known statisticians in the country have positions elsewhere from which 
it would be impossible to dislodge them with the bait to be offered; for though 
the department wishes to have statistics taught as an auxiliary to the study of 
X, it feels that there must be no question of the tail wagging the dog, and that 
economy is appropriate in this connection. The members of the department 
of professorial rank do not respond favorably to the suggestion that they should 
themselves undertake to teach the new and unfamiliar course. But every 
university department has a bright graduate student whose placement is an 
immediate problem. Young Jones has already demonstrated a quantitative turn 
of mind in the course on Money and Banking, or in the Ph.D. thesis on which 
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he has already made substantial progress, dealing with The Proportion of 
Public School Yard Areas Surfaced with Gravel. He may even recall having 
had a high-school course in trigonometry. His personality is all that might 
be desired. He is a white, Protestant, native-born American. And so the 
‘Tnstructor to be annoimced"' materializes as Jones. 

This earnest young scholar now finds that, in addition to completing his 
thesis, he must look up the literature of statistics and prepare a course in the 
subject. His attention is directed by older members of the department to 
some of the research papers in the field of X involving statistics. He pursues 
^ ‘statistics^’ through the library card catalog and the encyclopedias. He reads 
about census and vital statistics, price statistics, statistical mechanics. Per- 
haps he encounters probable errors. Eventually he learns that Karl Pearson 
is the great man of statistics, and that Biometrika is the central source of infor- 
mation. Unfortunately most of the papers in Biometrika and of Pearson’s 
writings, while not lacking in vigor, trail off into mathematical discourse of a 
kind with which young Jones feels ill at ease. What he wants is a textbook, 
couched in simple language and omitting all mathematics, to make the subject 
clear to a beginner. Perhaps he finds the impressive books of Yule and Bowley, 
but decides that they are too abstruse. Elderton’s “Frequency Curves and 
Correlation” is far too mathematical. Jones decides that a simple book on 
statistics must be written, and that he will do it if he can ever succeed in master- 
ing the subject. In the meantime, he contents himself perforce with the less 
mathematical writings of Karl Pearson, with applied examples in the field of X, 
and with such nonmathematical textbooks as may have been written by other 
young men who have earlier trod the same path as that on which Jones is now 
beginning. Somehow or other he gets the class through the course. After 
doing this two or three times, Jones is an experienced teacher of statistics, and 
his services are much in demand. His course expands, takes on a settled form, 
and after a while crystallizes into a textbook. At the same time he may be 
getting out some research, consisting of studies in the field of X in which statis- 
tical methods play a part. His promotion is rapid. He becomes a Professor 
of Statistics, and perhaps an officer in a national association. His textbook 
has a large sale, and is used as a source by other young men writing textbooks 
on statistics. 

The textbooks written in this way form an interesting literary cycle. Meas- 
ures of “central tendency” and of dispersion are introduced, and the use of 
one as against another of these measures is debated on every ground except 
the criterion that modern research has shown to be the important one, the 
sampling stability. Sampling considerations, indeed, get little attention. 
The urge to simplify by leaving out the more difficult parts of the subject, and 
especially the mathematical parts, is accompanied by pride in the great number 
of examples drawn from real life, that is, actual data that have been collected. 

But the most fascinating feature of this literary cycle is the opportunity it 
offers for research by the standard methods of literary investigation, tracing the 
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influence of one author upon another through parallelism of passages, and so 
forth. This study is facilitated by the accumulation of errors wii^ repeated 
cop 3 ring. One outstanding example is in certain formulae connected with the 
rank correlation coefficient, derived originally by ICarl Pearson in 1907 and 
copied from textbook to textbook without adequate checking back. As one 
error after another was introduced in this process, the formulae presented to 
students (and apparently made the basis of class exercises involving numerical 
substitution) became less and less like Pearson’s original equations. Inci- 
dentally, in trying to check this original work of Pearson's, recent investigation 
has raised the suspicion that it is erroneous; at any rate, he does not give a fully 
adequate argument. Thus it may be that the errors in copying, which are so 
useful in examining the history of statistics, never did any harm. The formulae 
in which the students were drilled may have been no worse than they would 
have been if all the copying had been done with more care. 

While this process has been going on in the Department of X, the Y and Z 
Departments have likewise evolved the teaching of statistics. There is some 
interchange of ideas between the various statisticians on the campus, and there 
is a catholicity in the copying of textbooks. But by and large, statistics is 
regarded in the Economics Department as a branch of economics, in the Psy- 
chology Department as a part of psychology, and so forth. The astronomer is 
inclined to resent the suggestion that his students should be called upon to study 
their least squares with anyone but an astronomer. Medical and biological 
investigators suspect Economics and Psychology of charlatanry, and do not 
look with favor on the idea of turning their own students over to such depart- 
ments for instruction in statistics. Most unthinkable of all would be putting 
the Department of Education in charge of an essential part of the training of 
scientific students. Thus the courses multiply. 

The fact that it is essentially the same fundamental subject that is being 
taught under various names and with various kinds of notation in different 
departments is often concealed by including the teaching of statistical theory 
in a course whose title and prospectus are more suggestive of applications. A 
case in point is that of an economist of my acquaintance, not primarily engaged 
in teaching, who some years ago was invited to give a course in Price Forecasting 
in the Economics Department of a leading university. He carefully prepared a 
series of lectures on this subject, which had been the center of some extended 
research he had conducted. A large class enrolled for the course. But soon 
after beginning his series of lectures the economist noticed that the class was 
growing restive. Upon inquiring what was amiss, he learned that his discourse 
was unintelligible to many of them because he was using technical statistical 
terms and concepts with which they were not familiar. He thereupon under- 
took to use simpler language, and when this did not sufiBce to convey his mean- 
ing, to explain the statistical notions involved in his work on price forecasting. 
More and more his lectures came to deal with the elements of statistics, and less 
and less with price forecasting. At the end of the term he felt that he had 
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given the students some elementary knowledge of statistical theory, for which 
they had not enrolled and for which he did not feel particularly well qualified, 
but had taught them virtually nothing about price forecasting. When the 
invitation was repeated the next year, the economist suggested imposing a course 
in statistics as a prerequisite for the course in Price Forecasting. Thishowever 
was vetoed by the head of the Economics Department, who did not believe in 
prerequisites. The Price Forecasting course was not repeated. 

This incident illustrates the evolution of a good deal of statistical teaching. 
At the beginning, the idea is to teach some application, but the teacher soon 
finds himself engaged at much more length than expected with the fundamentals 
of statistical theory and methods. In this way it has come about that a large 
number of persons are teaching theoretical statistics who initially had no inten- 
tion of doing so, but were concerned with particular applications. The teach- 
ing of statistical theory has been undertaken belatedly and inexpertly because 
it was necessary to a discussion of some application originally in view. Thus 
it happens that a good deal of teaching of statistics, even of mathematical 
statistics, masquerades as something else. 

The obvious inefficiency of overlapping and duplicating courses given inde- 
pendently in numerous departments by persons who are not really specialists 
in the subject leads to the suggestion that the whole matter be taken over by the 
Department of Mathematics. This is a promising solution, but it is doomed to 
failure if, as has sometimes happened, it means that the teaching of statistics 
is put under the jurisdiction of those who have no real interest in it. Moreover 
the teaching of statistics cannot be done appreciably better by mathematicians 
ignorant of the subject than by psychologists or agricultural experimenters 
ignorant of the subject. The latter indeed have a certain advantage in that the 
problems seem more real and definite to them; they can sense the difference 
between the important and the unimportant questions, even if they cannot 
express the questions in clear mathematical language, and can sometimes arrive 
intuitively at a correct result that leaves the mathematician puzzled. Also, 
they can understand more readily than can the mathematician the examples, 
drawn largely from biological material, which play so important a part in some 
of the leading expository work on statistics, such as Ri A. Fisher’s Staiislical 
Methods for Research Workers, The pure mathematician has only one advan- 
tage over the non-mathematical worker in empirical fields: he is able to set about 
reading the serious literature of statistical theory. But he must still find this 
scattered literature, sort it out from a mass of rubbish, fallacies, and false starts, 
and trace it back historically until he can understand the notation and the pre- 
suppositions. He must also contend with the fact that a good deal that is im- 
portant in statistics is still a matter of oral tradition, and some consists of lab- 
oratory techniques. In short, he needs a teacher before he himself sets out to 
teach the subject. When a Department of Mathematics calls in a young Ph.D., 
however brilliant, to teach statistics as a part or all of his program, the best 
thing it can do, if he has not already had a training in modern statistics, is to 
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give him a furlough for a year or two to enable him to go where he can acquire 
such a training. 

Qualifications of a good teacher of statistics include, first and foremost, a 
thorough knowledge of the subject. This statement seems trivial, but it has 
been ignored in such a way as to bring about the present unfortunate situation. 
Mathematicians and others, who deplore the tendency of Schools of Education 
to turn loose on the world teachers who have not specialized in the subjects they 
are to teach, would do well to consider their own tendency to entrust the teach- 
ing of statistics to persons who not only have not specialized in the subject, 
but have no sound knowledge of it whatever. A knowledge of theoretical 
statistics is not easy to obtain. There is no comprehensive treatise on the sub- 
ject, starting from first principles, and proceeding by sound deductions and 
well-chosen definitions to the methods that need to be used in practice. (I 
have been trying for years to write such a treatise, but it has turned out to be a 
bigger task than at first appeared. This is partly because some things formeriy 
thought to have been proved turn out, on critical examination, not to be sound, 
and much new research has been necessary.) The literature is scattered through 
journals pertaining primarily to many kinds of applications, and it is only in 
recent years that any large proportion of the current contributions to statistical 
theory and methods have been gathered into a few periodicals devoted to star 
tistical theory. On the other hand, the seeker after truth regarding statistical 
theory must make his way through or around an enormous amount of trash 
and downright error. The great accumulation of published writings on statis- 
tical theory and methods by authors who have not sufliciently studied the sub- 
ject is even more dangerous than the classroom teaching by the same people. 

A good teacher of statistics needs of course a mathematical background, in- 
cluding at least an acquaintance with the theory of functions and n-dimensional 
euclidean geometry. A good deal of additional algebra, and analysis are likely 
to be helpful, as well as some differential geometry. But no amount of such 
mathematics constitutes by itself any approach to sufficiency in the qualifica- 
tions of a teacher of statistics. The most essential thing is that the man shall 
know the theory of statistics itself thoroughly from the ground up, including 
the mathematical derivations of proper methods and a clear knowledge of how 
to apply them in various empirical fields. In addition to the pure mathematics 
and the knowledge of statistical theory, a competent statistician or teacher of 
statistics needs a really intimate acquaintance with the problems of one or more 
empirical subjects in which statistical methods are applied. This is quite im- 
portant. Sometimes excellent mathematicians have wasted time and misled 
students through failure to get that feeling for applications that is necessary for 
proper statistical work. 

The theory of statistics has been making advances so rapid and so fundamental 
that some of the first things that need to be said in an elementary course, even 
for prospective practical statisticians, are affected by some of the most recent 
researches. So elementary a question as “What definition is it wise to give to 
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the term ‘standard deviation’?”, which must be faced by every teacher of 
Statistics 1, requires for an intelligent answer a rather thorough imderstanding 
of modem sampling theory and techniques. The answer, it now seems, is 
v/ot the definition given in most textbooks. In the selection of a statistic to 
represent a parameter, for example in fitting frequency curves or in linkage 
estimation in genetics, the fundamental consideration is connected with the 
sampling distribution, as R. A. Fisher showed in founding the modern theory dt 
estimation. This is ignored in most of the current teaching of statistics, with 
the result that innumerable students are sent out to waste the money and time 
of their employers by demanding larger samples than are necessary for the pur- 
poses in view, wasting costly information by calculating ineflScient statistics 
and using tests that are not the most powerful. On the other hand, students of 
statistics who are taught rule-of-thumb methods without their derivations are 
never quite conscious of the exact limitations and assumptions involved, and 
may make unwarranted inferences from samples that are too small or in some 
way violate the conditions underlying the derivations of the formulae. 

A good teacher of statistics must be thoroughly familiar with these recent 
advances. He must examine very critically textbook statements unsupported 
by full proofs. Even though the students are not capable of following the 
complete mathematical argument — indeed, especially if the students are not to 
examine it — ^the instructor needs to give it a critical study. The custom of 
omitting proofs, which would not be tolerated in pure mathematics beyond 
a very limited extent, is common in the teaching of statistics, and is excused on 
the ground that the students do not know enough mathematics to understand 
the proofs. Perhaps in some cases a better reason is that the teachers, and the 
SiUthors of the textbooks, do not understand the proofs. In some instances 
no proofs exist, and in some instances no genuine proofs can exist, because the 
methods taught are demonstrably wrong. The custom prevalent in the teach- 
ing of mathematics of going over each proof carefully in the class is, among other 
things, a safeguard against infiltration of false propositions. This safeguard is 
missing from most of the teaching of statistics, and there has been an infiltration 
of errors. Since it is accepted that a great many students need to learn some- 
thing about statistical methods without learning enough mathematics to under- 
stand the proofs, it follows that the elementary teaching of statistics to these 
students must, if the perpetuation of gross errors is to be avoided, be in the 
hands of really competent mathematical statisticians. This is perhaps the 
greatest reform needed in the teaching of statistics today. Until the elemeniary 
teaching of statistics is conducted by those with a thorough and critical knowl- 
edge of current research in statistical theory, of a sort that seems virtually 
inseparable from participation in that research, there is likely to be a continua- 
tion of the laborious drilling of thousands of students in methods that ought 
never to be used. Here, of all places, is the great need for participation of 
research workers in elementary teaching. 

Teachers and textbook writers might well abandon the idea of telling what 
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statistical methods are used, and say instead what methods ought to be used. 
But before they can do this with confidence they must have a veiy close ac- 
quaintance with the research of the last three decades in statistical theory. 

How can an appointing officer know whether a prospective teacher of statistics 
knows his subject? This question requires no answer peculiar to statistics in 
distinction from other subjects. Publication of research, constituting a contri- 
bution to the particular field, has always been accepted as the best proof. A 
substantial contribution to fundamental statistical theory, which is to be dis- 
tinguished from the mere application of known statistical methods to empirical 
data, is the best indication of the kind of scholarship appropriate to a teacher of 
statistics. 

Participation in research is not novel as a criterion of what constitutes a good 
teacher of a college or university subject, if the subject is Greek literature, 
ph}rsics, chemistry, biology, or indeed any of those departments that have been 
long enough established to attain with respect to the organisation of their teach- 
ing a state approximating equilibrium. The more reputable institutions of 
higher learning have long maintained the principle, though with occasional 
violations in practice, that the Ph.D. degree or its equivalent, representing among 
other things the completion of a piece of scholarly research, is a minimum 
condition for a regular faculty appointment. It has usually been maintained 
also that the Ph.D. thesis should be a new contribution of a strictly scholarly 
character to the field of the scholar’s competence, and not merely a routine 
application of known methods to an extraneous field. Thus a thesis offered for 
the Ph.D. degree in mathematics would be judged by its contribution to mathe- 
matics, rather than to physics or accounting. Moreover the r^ard in which 
universities have held members of their faculties has been intimately connected 
with their output of scholarly research. Other criteria of excellence have not 
been ignored, but research has been recognized in a fiurly consistent manner. 
Some say that there has been an over-emphasis on research, and that more at- 
tention ought to be given to other qualities related to teaching. However 
this may be, the facts remain that scholarly research is something capable 
of a reasonably objective evaluation by scholars in the field, that it offers the 
main hope of fundamental progress, and that familiarity with current research 
is a necessary, though not sufficient, condition for the most important teaching 
in institutions of higher learning. 

A peculiarity of the teaching of statistics, of which in practice the theory of 
statistics is an essential even if unacknowledged part, is that a good deal of it 
has been conducted by persons engaged in research, not of a kind contributing to 
statistical theory, but consisting of the application of statistical methods and 
theory to something else. A similar situation would exist if the teaching of 
mathematics were in the hands of an assortment of various kinds of engineers, or 
if zoology and botany were taught by practicing physicians. The teaching 
mathematics and of elementary biology might perhaps gain in liveliness and 
concreteness by such arrangements, with the accompan}dng emphasis on the 
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particular applications of the fundamental sciences. Moreover the engineer 
might in the course of such teaching refresh his own knowledge of elementary 
mathematics, while the physician might gain by renewing his acquaintance with 
elementary biology. Such arrangements might occasionally be made with 
profit. But if they were the general rule the advantages of specialization would 
be lost; the fundamental sciences would not be developed in so well-rounded a 
manner as they are by specialists in them, while the special skills and knowledge 
of the physician and engineer could not be utilized to the full in their respective 
professions. Statistical theory is a big enough thing in itself to absorb the full- 
time attention of a specialist teaching it, without liis going out into applications 
too freely. Some attention to applications is indeed valuable, and perhaps 
even indispensable as a stage in the training of a teacher of statistics and as a 
continuing interest. But particular applications should not dominate the 
teaching of the fundamental science, any more than particular diseases should 
dominate the teaching of anatomy and bacteriology to pre-medical students. 
These subjects are not ordinarily taught by practicing physicians, but by anat- 
omists and bacteriologists respectively. 

In medical education the principle has been accepted, after a long struggle, 
that a medical school should have full-time professors engaged primarily in 
teaching and research, and that such professors should not treat patients except 
in cases of unusual interest from the standpoint of the science or art of medicine. 
An analogous principle would be that an institution offering extensive instruc- 
tion in statistics should have full-time professors engaged in the teaching of and 
research in statistical theory and methods, without spending time over applied 
statistical problems excepting insofar as such problems might present novel 
features calling for the development of new statistical methods or theoretical 
extensions having interest going beyond the immediate case. Sometimes the 
complaint is heard in medical schools that the teaching tends to become too 
theoretical on account of detachment from clinical practice, and a similar diffi- 
culty might conceivably develop in connection with statistics; but in neither 
case does the trouble seem to be beyond the ability of the personnel involved to 
cure if they have the right background. 

A specialist in statistics on a university faculty has a threefold function. In 
addition to the usual duties of teaching and research, there is a need for him to 
advise his colleagues, and other research workers, regarding the statistical 
methods appropriate to their various investigations. The advisory function is 
a highly important one for the activities of the university as a whole, and should 
be taken into consideration in adjusting the teaching load. Probably every 
university statistician is visited from time to time by earnest research workers, 
deeply engrossed in their respective specialities, speaking technical jargons un- 
familiar to the statistician, and seeking his advice on matters concerning which 
he has a sinking feeling of lack of comprehension. After some hours of psycho- 
analyzing his visitor the statistician may be able to ascertain what it is he really 
wants to know, and thereafter either refer him to some standard formula, or 
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more often, undertake a piece of new mathematical research designed to fit 
the particular problem, and very possibly having value also for a more extended 
class of problems. The statistician is then very likely to find himself embarked 
on a co-operative research venture in a field that is new to him. 

To function well in this third, the consultative or co-operative function, he 
must have an unusually large store of general information. No one stands in 
greater need than he of that knowledge of “something about everything and 
everything about something” that was once said to be the goal of a liberal 
education. In planning the education of statisticians and teachers of statistics 
these considerations point to a somewhat wider diffusion of studies among vari- 
ous fields than is customary in many institutions, especially in graduate work. 
The co-operation, and their other work, would also be facilitated if research 
workers in general were more strongly urged to get a training in mathematical 
statistics at an early stage in their careers. 

The problem of departmental organization is secondary to that of getting men 
having the requisite qualities of extensive mathematical preparation, a thorough 
knowledge of modem theoretical statistics, an understanding of some fields at 
least in which statistical methods can be applied, and the type of inquiring 
mind sometimes described as a “research outlook.” A Department of Mathe- 
matics may well handle the fundamental teaching in statistics, provided it has 
men properly qualified for such teaching. If it does not have such men, its 
teaching of statistics and its inability to provide the needed statistical advice 
will inevitably tempt the other departments to set up again their own duplicat- 
ing courses in what amounts essentially to statistical theory and methods, and 
to repeat the mistakes of the past. 

A separate Department of Statistics, if competently staffed, could very well 
provide advice for the whole institution as well as conducting elementary in- 
stmetion in statistical methods and theory, both for students having calculus 
and for those without it, and should certainly carry on advanced teaching and 
research in statistical theory and methods. But for efficient functioning of the 
institution as a whole it should be agreed that the Department of Statistics or 
the Department of Mathematics should do all the elementary instmetion in 
statistics, and that courses in statistics in other departments should be confined 
to applications of the basic theory. Normally such courses in applied statistics 
in the other departments should require as a prerequisite one or more of the basic 
courses in the Department of Statistics, or of Mathematics. The basic course 
to be required as a prerequisite to others should be the one which itself requires 
calculus as a prerequisite wherever this is practicable. It is practicable for 
students of engineering, physics, astronomy, and mathematical economics, since 
these students must have calculus anyhow. Moreover the value of the se- 
quence consisting of calculus, statistical theory and applied statistics, in this 
order, is so great that many other students are likely to avail themselves of it 
when it is once established and the true nature and value of statistics are more 
widely understood. 
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Exactly how far a Department of Statistics should go in particular applica- 
tions would have to be decided anew from time to time by its members in the 
light of changing conditions and interests. It cannot teach everything that goes 
by the name of statistics. This problem may be exemplified by the case of 
population and vital statistics. This is a field with close connections with so- 
ciology, biology, medicine and insurance. It is cultivated in conjunction with 
each of these subjects in various places. Some of its most interesting and im- 
portant phases make use of quite advanced mathematics, as in the work of 
A. J. Lotka, and in addition there is extensive use, and more extensive need, of 
the statistical methods centered around sampling theory which are the appro- 
priate domain of a Department of Statistics. Should the study of population 
and vital statistics be included in a Department of Statistics? I think not, 
except as a temporary arrangement, or in a .small institution, in spite of the 
history of the word “statistics,” which originated in connection with material 
of this kind, and in one of its meanings is still applied to it. (My use of the 
imqualified word “statistics” in this paper is in the sense of theory and methods, 
not in the sense of statistical facts such as those found by the census.) Medical, 
biological and sociological considerations are prominent in the problems of vital 
statistics, and one of these departments might well handle the subject. But 
the vital statistician, like other research workers, should have acquired in the 
course of his training an intimate famiUarity with the statistical theory and 
methods which are the appropriate province of a Department of Statistics. 
He also needs mathematics through integral equations, if he is to understand and 
extend the contributions of Lotka and Volterra. Students of vital statistics 
should have had an elementary course in statistical theory in the Department of 
Statistics, preferably the course requiring calculus. 

A course in price statistics should be taught by an economist, presumably in 
the Department of Economics, but might well require as a prerequisite the same 
elementary courses in statistical theory and methods as would be required in 
psychology, medicine and other fields. In addition, there are problems of time 
series analysis whose treatment calls for a mathematical statistician having some 
acquaintance with both economic and meteorological data. A course on the 
treatment of time series might appropriately be included in the Department of 
Statistics, requiring the general elementary course as a prerequisite, and itself 
serving as a prerequisite for courses in economic and meteorological statistics. 

One of the chief obstacles to efficient organization of teaching is the habit of 
not prescribing prerequisites outside one’s own department. But when once 
the elementary courses in statistics have become established in the hands of well- 
equipped specialists in statistical theory and methods, in whose competence 
general confidence can be reposed, the various departments of application will 
lose their motive for establishing their own duplicating courses, and will be able 
to cultivate more intensively their respective specialities. 

The detection of biases and the details of practical statistical work vary greatly 
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from one application to another. These, consequently, are matters for the de- 
partments concerned with applications rather than with the fundamentals of 
statistics, and should not be the chief features of a course in elementary statis- 
tical methods and theory. The work of a Department of Statistics should be 
concerned largely with sampling theory, and should emphasize the unity of 
statistical methods and theory, regardless of the field of application. It should 
deal with statistics as a coherent science of inductive inference, of the prepara- 
tion of observations for inference, and of the planning of investigations so as to 
yield observations from which inferences can best be made. 

The question what mathematical prerequisites should be established for the 
fundamental course in statistical theory must be answered by a compromise 
between the ideal and what is expedient at a particular time and place. In 
Europe a large number of .students have had a year of calculus before coming to 
universities, that is, before reaching the age of eighteen. If a university were 
willing to restrict its entrants to such students (thus automatically solving the 
problem of overcrowding) it could give them another year of calculus, mixed 
perhaps with advanced algebra and geometry, and then in their sophomore year 
give them a thorough course in elementary statistics and probability, based on 
calculus. The.se students would then be ready to tackle advanced statistics in 
the third year in a really effective way. If the teaching of economic theory, 
physics, chemistry and astronomy were geared to this program in such a way as 
to make real use of the calculus, the work in these subjects could be made far 
more efficient, in the sense that more material could be covered effectively in 
the allotted time, or an equivalent amount of material in less time. If, in addi- 
tion, ail the many departments in which statistical methods and theory are used 
required these statistical courses as prerequisites, and actually used the mate- 
rials of these courses in their work, there would be a further huge gain in effi- 
ciency. The baccalaureate degree of such an institution would represent a far 
more thorough knowledge, and command of the tools of research, than is possible 
without an arrangement putting in this way the fundamentals first. 

Institutions unwilling to undertake such a drastic improvement must face 
more or less delay and inadequacy in the acquisition by their students of the 
fundamentals of mathematics and of statistics. A division of the students into 
groups according to mathematical ability ought to be undertaken, and followed 
by a corresponding division of the elementary statistics course. Students having 
high mathematical ability could begin the study of statistics after completing 
calculus, and could look forward to rising ultimately to greater heights in pur- 
suits involving mathematical or statistical knowledge than those of lesser mathe- 
matical talents. For these latter there would still be the possibility of acquir- 
ing, even without calculus, useful statistical tools; but it is essential that this 
should be done under the guidance of instructors thoroughly familiar with the 
mathematics of statistics. The task of leading the blind must not be turned 
over to the blind. Students possessing the ability to master the calculus should 
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be encouraged to begin the study of statistics with the course having calculus 
as a prerequisite, and should not be put into the necessarily slower group not 
having the calculus. I believe that these elementary courses should begin with 
the theory of probability, but should go on to the chief distribution fiinctions 
used in practice, and should include applied problems and work on calculating 
machines. 

Putting a sound program of statistical teaching into effect will take time, 
partly because of the scarcity of suitable teachers of statistics. Nevertheless, 
the process is well under way, and the prospects are good for substantial im- 
provements in the teaching of statistics. A body of able young research men 
possessing the requisite knowledge of statistical fundamentals is now in existence 
and is growing. Some of the recent textbooks represent striking improvements. 
The Institute of Mathematical Statistics itself, with the Anmla of Mathematical 
Statistics, is perhaps the best evidence of a changed view making for better 
things. 

Columbia TJnivbbsitt, 

New Yohk, N. Y. 


DISCUSSION OF PROFESSOR HOTELLING’S PAPER 
By W. Edwards Dbminq 

It is a pleasure to endorse Professor Hotelling’s recommendations; in fact we 
have been following them pretty closely in the courses in the Graduate School 
of the Department of Agriculture. As a matter of fact, he has indirectly played 
an influential part in building up this set of courses, because some of our best 
instructors are his former students. 

Listening to Professor Hotelling’s paper, I was thinking of the possibility 
that some of his recommendations might be misunderstood. I take it that they 
are not supposed to embody all that there is in the teaching of statistics, because 
there are many other neglected phases that ought to be stressed. In the Bureau 
of the Census the population division alone has augmented its force by ap- 
proximately 3600 statistical clerks during the past six months. They come from 
diverse schools and it has been interesting to observe how many of them have the 
idea that all the problems of sampling and inference from data can be solved by 
what are commonly known as modem statistical techniques — correlation co- 
efficients, rank correlation coefficients, chi-square, analysis of variance, con- 
fidence limits, and the like. Most of them are shocked to learn that many of 
the so-called modem ‘^theories of estimation” are not theories of estimation at 
all, but are rather theories of distribution and are a disappointment to one who is 
faced with the necessity of making a prediction from his data, i.e., of basing 
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some critical course of action on them. The conviction that such devices as 
confidence limits and Student’s t provide a basis for action regardless of the 
size of the sample whence they were computed, even under conditions of statis- 
tical control, is too common a fallacy. On the other hand, many simple but 
worthy devices are neglected. A histogram, for instance, can be a genuine 
tool of prediction if it is built up layer by layer in different legends so as to dis- 
tinguish the different sources whence the data are derived. The modem student, 
and too often his teacher, overlook the fact that such a simple thing as a scatter 
diagram is a more important tool of prediction than the correlation coefficient, 
especially if the points are labeled so as to distinguish the different sources of the 
data. Most students do not realize that for purposes of prediction the con- 
sistency or lack of it between many small samples may be much more valuable 
than any probability calculations that can be made from them or from the entire 
lot. Students are not usually admonished against grouping data from heterog- 
eneous sources. Of those that are not guilty of indiscriminate grouping, many 
are inclined to rely on statistical tests for distinguishing heterogeneity, rather 
than on a careful consideration of the sources of the data. Too little attention 
is given to the need for statistical control, or to put it more pertinently, since 
statistical control (randomness) is so rarely found, too little attention is given 
to the interpretation of data that arise from conditions not in statistical control. 

Nevertheless, the fundamentals of probability and sampling theory, and the 
mathematics of the distribution functions, though by themselves they do not 
qualify anyone for high-grade statistical work, are ultimately essential for pro- 
ficiency in statistics. Since they are seldom learned away from the university 
they are properly made the main theme of teaching. The university is the 
place to learn the studies that are so difficult to get outside of it. 

Above all, a statistician must be a scientist. The skepticism of many first 
class scientists of today for modern statistical methods should be a challenge to 
statistical teaching. A scientist does not neglect any pertinent information, 
yet students of statistics are often taught to do just the opposite of this, and are 
accused of being old-fashioned for daring to think of combining experience with 
the new information provided by a sample, even if it is a pitifully small one. 
Statisticians must be trained to do more than to feed numbers into the mill and 
grind out probabilities; they must look carefully at the data, and take account 
of the conditions under which each observation arises. It is my feeling that 
the chief duty of a statistician is to help design experiments in such a way 
that they provide the maximum knowledge for purposes of prediction; another 
is to compile data with the same object in view; and still a third function is 
to help bring about some changes in the source of the data. Scientific data 
are not taken merely for inventory purposes. There is no use taking data if 
you don’t intend to do something about the sources whence they arise. 

Bureau of the Census, 

Washington 
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RESOLUTIONS ON THE TEACHING OF STATISTICS 

The Institute of Mathematical Statistics at its business meeting on September 
11, 1940 at Dartmouth College adopted the following resolutions regarding the 
teaching of statistics. The resolutions were drawn up by a committee appointed 
by the President, and consisting of Burton H. Camp, W. Edwards Doming, 
Harold Hotelling, and Jerzy Neyman. 

1. If the teaching of statistical theory and methods is to be satisfactory, it 
should be in the hands of persons who have made comprehensive studies of the 
mathematical theory of statistics, and who have been in active contact with 
applications in one or more fields. 

2. The judgment of the adequacy of a teacher’s knowledge of statistical 
theory must rest initially on his published contributions to statistical theory, in 
contrast with mere applications, in a manner analogous to that long accepted in 
other university subjects. 

3. These ideas are expressed in detail in the paper The teaching of statistics, 
by Professor Harold Hotelling, and the Institute decides to give both the 
resolution and the paper as wide a circulation as possible. 



REPORT OF THE HANOVER MEETING OF THE JNSTITUTE 

The sixth meeting of the Institute of Mathematical Statistics was held at 
Dartmouth College, Hanover, New Hampshire, Tuesday to Thursday, Sep- 
tember 10 to 12, 1940, in conjunction with meetings of the American Mathe- 
matical Society and of the Mathematical Association of America. The fol- 
lowing forty-two members of the Institute attended the meeting: 

H. E. Arnold, Felix Bernstein, G. W. Brown, J. H. Bushey, B. H. Camp, A. T, Craig, 
A. R. Crathorne, J. H. Curtiss, J. F, Daly, W. E. Deming, J. L. Doob, Churchill Eisenhart, 
M. L. Elveback, C. H. Fischer, M. M. Flood, R. M. Foster, T. C. Fry, H. P. Geiringer, 
Robert Henderson, E. H. C. Hildebrandt, G. M. Hopper, Harold Hotelling, E. V. Hunting- 
ton, M. H. Ingraham, Dunham Jackson, W. L. Kichline, L. F. Knudsen, B. A. Lengyel, 
W. G. Madow, J. W. Mauchly, Richard von Mises, E. B. Mode, Jerzy Neyman, P. S. 01m- 
stead, Oystein Ore, M. M. Sandomire, L. W. Shaw, F. F. Stephan, A. G. Swanson, Abra- 
ham Wald, S. S. Wilks, Jacob Wolfowitz. 

The meeting of the Institute consisted of four sessions. At the first session, 
which was held on Tuesday morning. Professor Harold Hotelling of Columbia 
University delivered an address on The Teaching of Statistics, This address 
was followed by considerable discussion on the various aspects of the teaching 
of statistics.^ Preceding Professor Hotelling^s address a short paper on an 
Empirical Comparison of the ^^Stnootit'^ test for goodness of fit with Pearson^s 
Chi-Square test was presented by Professor J. Neyman of the University of 
California. 

Following Professor Hotelling’s address a business meeting of the Institute 
was held. At this time resolutions on the teaching of statistics were approved 
(sec p. 472). The President reported that a War Preparedness Committee 
had been appointed in the summer to study the matter of the Institute’s par- 
ticipation in the national defense program.^ The Chairman of this Committee 
submitted a preliminary report which met the approval of the Institute. A 
plan was approved for completing the report and circularizing it with a minimum 
of delay. 

The matter of the organization of locil sections or chapters of the Institute 
was discussed but no action was taken. 

^ Professor Hotelling’s address and three resolutions regarding the teaching of Statis- 
tics which were adopted by the Institute at a business meeting following the address are 
published in the present issue of the Annals of Mathematical Statistics^ pp. 457-472. 

* The membership of the Committee is as follows: 

Professor Churchill Eisenhart (Chairman), University of Wisconsin. 

Professor A. T. Craig, University of Iowa. 

Professor E. G. Olds, Carnegie Institute of Technology. 

Captain Leslie E. Simon, Aberdeen Proving Ground. 

Mr. Ralph E. Wareham, General Electric Company. 
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On Tuesday afternoon a session on contributed papers in Mathematical 
Statistics was held jointly with the American Mathematical Society. Pro- 
fessor B. H. Camp of Wesleyan University presided and the following papers 
were presented: 

1. Contributions to the theory of the representative method of sampling. 

Dr, W. G. Madow, Department of Agriculture, Washington. 

2. A generalization of the law of large numbers. 

Dr, Hilda P. Geiringer, Bryn Mawr College. 

3. On the problem of two samples from normal populations with unequal variances. 
Professor S. S. Wilks, Princeton University. 

4. Experimental determination of the maximum of an empirical function. 

Professor Harold Hotelling, Columbia University. 

5. Asymptotically shortest confidence intervals. 

Dr. Abraham Wald, Columbia University. 

6. Reduction of certain composite statistical hypotheses. 

Dr. G. W. Brown, R. H. Macy and Company, Inc., New York. 

7. Conception of equivalence in the limit of tests and its application to certain X and x* 
tests. 

Professor J. Neyman, University of California. 

Abstracts of these papers follow this report. 

On Wednesday morning a session was held on The Theory of Probability 
with Dr. T. C. Fry of the Bell Telephone Laboratories, in the chair. The 
following addresses were given: 

1. On the foundations of probability theory. 

Professor R. von Mises, Harvard University. 

2. Probability as measure. 

Professor J. L. Doob, University of Illinois. 

This session was followed by an energetic discussion which was continued in an 
informal afternoon session. 

The Thursday morning session was devoted to the Theory of Statistical Esti- 
mation with Professor Harold Hotelling as Chairman. The following addresses 
were given: 

1. Estimation by intervals as a classical problem in probability. 

Professor J. Neyman, The University of California. 

2. Statistical estimation in large sample. Dr. Joseph F. Daly, The Catholic Univer- 
sity of America. 

On Monday at 4:15 p.m. a tea was held at the Graduate Club for members 
of the mathematical organizations and their guests, and on Monday at 8:00 a 
musical performance was presented. On Tuesday at 7:00 p.m. a joint dinner 
was held for the mathematical organizations in Thayer Hall. Wednesday 
afternoon was devoted to an excursion to Franconia Notch. 

During the meeting a collection of string models of ruled surfaces was ex- 
hibited by Professor Robin Robinson of Dartmouth College and electrical 
calculation apparatus made from telephone equipment was exhibited by mem- 
bers of the staff of the Bell Telephone Laboratories. 



ABSTRACTS OF PAPERS 

(Presented on September 10, 1940, at the Hanover meeting of the Institute) 

Contributions to the Theory of the Representative Method of Sampling. 

William G. Madow, Washington, D. C. 

The theory of representative sampling may be regarded as a dual sampling process; the 
first of which consists in the sampling of different random variables and the second of which 
consists in repeating several times the experiments associated with each of the different 
random variables. It follows that while the theory of sampling from finite populations 
without replacement may be required for the first process, the second leads directly into 
the theory of sampling from infinite populations. There is, however, one difference. 
Although the usual theory is concerned with the evaluation of fiducial or confidence limits 
for parameters the theory of sampling is concerned with the evaluation of fiducial or confi- 
dence limits for, say, the mean of a sample of N, when n, (AT > n), of the values are known. 

It is thus possible to use the usual theories of estimation in obtaining estimates of the 
parameters and to allow the effects of subsampling process to show themselves in the 
different values of the fiducial limits. It is shown that the limits obtained are almost 
identical with those obtained by the theory of sampling from a finite population. Distri- 
butions of the statistics used in these limits are derived. 

Besides these results, the theory is extended to the theory of sampling vectors, and condi- 
tions are stated under which the “best’^ allocation of the number in a sample among several 
strata is proportional to the A:th roots of the generalized variance of a random vector 
having k components. 

A Generalization of the Law of Large Numbers. Hilda Geiringer, Bryn 
Mawr. 

Let Vt(x), Fa(z), • • • , Vn{x) be n probability distributions which are not supposed to 
be independent and let F{xi , Xa , • • ♦ , Xn) be a “statistical function^ of n observations 
in the sense of v. Mises, — y»(x) (i — 1, 2, ••• n) indicating as usual the probability of 
getting a result ^ x at the ith observation — . Then ii can be proved that under fairly 
general conditions F{xi ^ x^ , • • • , Xn) converges stochastically toward its 'theoretical 
value*') or in other words, that under these general conditions a great class of statistics 
F(xi j Xa f ••• f Xn) is "consistent" in the sense of R. A. Fisher. 

Well known particular cases of this theorem result if (o) we take for F(xi , xi , • • • , Xn) 
the average (xi -f 4- • * * + Xn)/n of the n observations, {h) we assume that the Viix) 
are independent distributions. 

On the Problem of Two Samples from Normal Populations with Unequal Vari- 
ances. S. S. Wilks, Princeton University. 

Suppose Oni and O^a are samples of ni and ns elements from normal populations n and 
ITS respectively. Let ai , and as , <r| be the means and variances of vi and vs and let 
Oni and Ona have means and £« and variances s} and s* (unbiased estimates of <r \ , ol) 
respectively. It is shown that there exists no function (Borel measurable) of f i , is , 
8i , Ss , ai ~ 02 independent of <ri and vs , having its probability law independent of the 
four population parameters. It is therefore impossible to obtain exact confidence limits 
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for oi — as corresponding to a given confidence coefficient. Functions of the four parame- 
ters and four statistics are devised from which one can set up confidence limits for ai — at 
with associated confidence coefficient inequalities. 

Experimental Determination of the Maximum of an Empirical Function. 

Harold Hotelling, Columbia University. 

In physical and economic experimentation to determine the maximum of an unknown 
function, for example of a monopolist's profit as a function of price, or of the magnetic 
permeability of an alloy as a function of its composition, the characteristic procedure is to 
perform experiments with chosen values of the argument ar, each of which then yields an 
observation, subject to error, on the corresponding functional value y — j{x). The values 
of X need, however, to be chosen on the basis of earlier experiments in order to make the 
determination efficient. The experimentation properly proceeds, therefore, in successive 
stages, with the values used at each stage determined with the help of the earlier work. 
The question what distribution of x as a function of previous results should be used is 
discussed in this paper on the basis of various hypotheses regarding the function, and 
further criteria. In particular, a conflict is shown to exist under some conditions between 
the criterion of minimum sampling variance and that calling for absence of bias. 

As]rmptotically Shortest Confidence Intervals. Abraham Wald, Columbia 
University. 

Let /(x, 6) be the probability density function of a variate x involving an unknown 
parameter d. Denote by xi , • • • , Xn n independent observations on x and let Cn{B) be a 

positive function of 9 such that the probability that ~ > log/(x« , 9) < Cn{9) 

y/n o9 ^Tf 

is equal to a constant under the assumption that 9 is the true value of the parameter. 

1 9 

Denote by d'(Xi , • • • , Xn) the root in 9 of the equation — > log /(Xa, 9) « Cn(9) 

y/n 99 ^ 

1 d 

and by 9^'(x \ , • • • , x„) the root of — rn — > ^ogf{xa, 9) — —Cn{9), Under some weak 

y/n 99 

^sumptions on /(x, 9) the interval 5n(xi , • • • , Xn) - [9'{x\ , • • • , x»), ^"(xi , • • • , x»)I 
is in the limit with ^ b. shortest unbiased confidence interval^ of 9 corresponding to 
the confidence coefficient This confidence interval is identical with that given by S. S. 
Wilks in his paper ^‘Shortest average confidence intervals from large samples,^* The Annals 
of Mathematical Statiaticsj Sept. 1938. Wilks has shown that 5n(xi, • • • , Xn) is asymptot- 
ically shortest in the average compared with all confidence intervals computed on the 
basis of statistics belonging to a certain class C. In the present paper it has been proved 
that the confidence interval in question is asymptotically shortest compared with any 
arbitrary unbiased confidence interval, without any restriction to a certain class of 
functions. 

Reduction of Certain Composite Statistical Hypotheses. George W. Brown, 
R. H. Macy and C!o., New York. 

The results obtained make it possible to reduce a large class of composite statistical 
hypotheses to equivalent simple hypotheses. The fundamental theorem established states 
essentially that if two distributions give rise, in sampling, to the same distribution of the 

^ For the definition of a shortest unbiased confidence interval see the paper by J. Ney- 
man, ^'Outline of a theory of statistical estimation based on the classical theory of proba- 
bility,'' Phil. Trans. Roy. 8oc. (1937). 
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set of differences between observations, then one distribution must be a translation of the 
other, subject to a condition requiring that the characteristic function of one of the distri* 
butions be such that any interior intervals of zeros be not too large. The result is estab- 
lished by moans of the functional equation — (*) *■ — ^i) 

relating the characteristic functions. Similar results are obtained for scale, and com- 
bination of location and scale, and the corresponding situations in multivariate distribu- 
tions. This type of uniqueness theorem permits one to reduce a composite hypothesis 
involving an unknown location parameter (or scale, or both) to an equivalent simple 
hypothesis. 

Conception of Equivalence in the Limit of Tests and Its Application to Certain 
X- and x^-Tests. J. Neyman, University of California. 

Denote by a system of observable variables and by N the number of independent 
observations of those variables to be used for testing a certain statistical hypothesis H 
against a set 0 of admissible simple hypotheses h. Let further Ti(N) and Ti(N) be two 
different tests of H using the same number N of observations. Consider the probability 
Ps{h) calculated on any admissible simple hypothesis h, of the two tests, contradicting 
themselves. 

Definition: If, whatever be X c 0, the probability Py(X) tends to zero as N is indefinitely 
increased, then the two tests are said to be equivalent in the limit. 

Consider a number 8 of series of independent trials and denote by En , Ei^ , * * * » Eimi 
all the mi possible and mutually exclusive outcomes of each of the trials forming the ith 
series. Let p,‘| be the probability oi Ea y ni the total number of trials in the tth series, 
and n,/ the number of these which give the outcome Ea , 

Suppose that it is desired to test a composite hypothesis H concerning all the proba- 
bilities Pa and consisting of the assumption that any one of them is a given linear function 
of some t independent parameters Bk t so that 


( 1 ) 


Pii CtiiO 4" 0»,jSi + • • • 4* 


where the coefficients a^k are known. The main result of the paper is then that the X-test 
of the above hypothesis //, tested against the set Q of alternatives ascribing to the pa 
any non-negative values, is equivalent in the limit to th^ test consisting of rejecting H 
when the minimum of the expression 


( 2 ) 


<-1 /-I 


(ng - niPaP 
ng 


calculated with respect to unrestricted variation of the ^’s, exceeds the tabled value of x! 
corresponding to the chosen level of significance c and to the number of degrees of freedom 

jt 

Y! mi - 8 - L 

It will be noticed that the expression (2) differs from the usual x* in the denominator 
of each term. 

As an example of the application of the test based on (2), consider the case where M 
varieties of sugar beet are tested for resistance to a certain disease in an experiment 
arranged in N randomized blocks. Denote by n the number of beets selected at random 
for inspection from each plot and by ng the number of those of the ith variety from the 
plot in the jth block which are found to be infected. Denote further by pg the proportion 
of infected beets of the tth variety in the plot in the jth block. The hypothesis that the 
effects of variety and of block are additive is expressed by Pti ■■ p 4- V'< 4- -B/ with 
■* EBy 0. To test this hypothesis we may use (2) which in this particular case 
reduces itself to 
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M N 

(3) X* - E £ -P-Vi- S,)« 

t -1 1-1 

with Wii » n*/{wiy(n — n<y)), * na/n. The minimum xo of X* is found by solving a 

sot of equations which are linear in p, Vi f Bj and the comparison of xo with the tabled 
value corresponding to (M -* l)(iV’ — 1) degrees of freedom will tell us whether we are 
likely to be very wrong in assuming additivity or not. In the favorable case we may 
next proceed similarly to test another hypothesis that there is no differentiation between 
the varieties, so that Fi « - • • • « Tj# * 0. 

Empirical Comparison of the ^^Smooth” Test for Goodness of Fit with the 
Pearson’s Test. J. Neyman, University of California. 

In a previous publication* the author has deduced a test for goodness of fit, described 
as the ^ ^smooth test” or the ^ test, applicable to cases where the hypothesis tested H 
is simple. The test is so devised as to be particularly sensitive to departures from H 
which are ^smooth” in the sense explained in detail in the publication quoted. Whether 
the test so devised does present any advantage over the usual x* test depends on how 
frequently we meet, in practice, cases where the hypotheses alternative to the one tested 
are actually smooth. 

The present investigation was undertaken with the object of obtaining some information 
on this point. For that purpose a number of cases described in the literature where there 
was a question of testing that some observable variable x follows some perfectly specified 
distribution p(x) were analyzed. Of all such cases, the ones where there were a priori 
theoretical reasons to believe that p{x) could not possibly represent the true distribution 
of x and, at the most, it could be considered as only an approximation to the true distri- 
bution were selected. 

It was assumed that the departures from the hypothetical distributions are typical of 
those that may be met in practice when no definite information as to the actual state of 
affairs is available. The hypothesis of goodness of fit was tested both by means of the 
x’ and by the fourth order smooth test. Out of the 130 cases studied the two tests were 
in perfect agreement eight times. Out of the remaining 122 cases the smooth test proved 
to be more sensitive than the x* in 70 cases and the x* better than the smooth test in 52 
cases. We may further compare the tests by counting those cases where one of them 
detected the falsehood of the hypothesis tested at a given level of significance while the 
other failed to do so. At the level of significance .05 the x* test rejected the hypothesis 
tested 13 times, while was >.06. The reverse was true in 17 cases. At the level of 
significance .01 the corresponding figures are 5 and 14, again in favor of the smooth test. 

* J. Neyman, “ ‘Smooth Test' for Goodness of Fit.” Skandinaviak Aktuarietidakriftf 
1937, pp. 149-199. 



REPORT OF THE WAR PREPAREDNESS COMMITTEE OF THE 
INSTITUTE OF MATHEMATICAL STATISTICS 


The generally recognized functions of a statistician are the calculation of 
averages, percentages, and index niunbers; the construction of bar graphs and 
pie diagrams; and the compilation of data in general. His other activities 
are less widely known. In particular, the recent advances in mathematical sta- 
tistics are known to a relatively small proportion of the persons occupying 
responsible positions in academic life, in industry, and in government. The 
mathematical statistician, in fact, is concerned chiefly with the interpretation 
of data through the use of probability theory; his is the science of reasoning 
from a part to the whole, and of prediction; and to him falls the task of stating 
the conditions under which such inferences are possible, of devising means of 
testing whether these conditions are satisfied, and of evaluating the prob- 
ability that such ^uncertain inferences’ are correct in specific instances. Fur- 
thermore, it is his responsibility to so plan the lay-out of experiments and the 
conduct of surveys that the data they yield will contain the maximum informa- 
tion on the points at issue and be amenable to unambiguous statistical 
interpretation. 

Because of the functions which the mathematiccU statistician can perform his 
services should be of value to the National Defense Program in the following 
fields: 

I. Quality Control and Specification. The functions of a mathematical 
statistical nature connected with quality control and specification of articles 
produced by mass production are: 

(1) Tests of randomness. These are important because statistical methods 
of inference are strictly valid only for random samples. 

(2) The use of prohabiliiy theory in predicting the outcome of future repetitions 
of an operation which is in a state of statistical control} The evaluation of the 
probability that the quality of a piece of product will lie within any previously 
specified tolerance limits as long as a state of statistical control is maintained, 
and the development of sampling inspection techniques are examples of this 
fimction. 

^ A repetitive operation, such as a production process, is said to be in a slate of statistical 
control when it produces a sequence of observations which exhibit the property random- 
ness, An important aspect of quality control is the improvement of quality which comes 
as the result of an effort to reduce a manufacturing process to a state of statistical control. 
Furthermore, when this state of control is attained it is possible to gain a reduction in 
cost of inspection, a reduction in cost of rejections, a reduction in tolerance limits where 
quality measurement is indirect, and the attainment of uniform quality even though the 
inspection test is destructive. 
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(3) Representative sampling. When a repetitive operation such as a produc- 
tion process is not in a state of statistical control, it is not possible to make 
valid inferences about the quality of a lot from an examination of a sample 
from the lot unless the sampling process is one of random selection within 
‘‘strata^’ in accordance with the principles of representative sampling. 

(4) Analysis of variance. Reference is made here to the technique w'hereby 
the total variability of a product of an operation which is in a state of statis- 
tical control can be decomposed into components associated with the various 
sub-operations involved. 

(5) Correlation methods. When a direct measurement of quality is extremely 
costly, it is sometimes advisable to use as an indirect measurement of quality 
the value of some character less costly to measure which is highly correlated 
with quality. 

(6) Specification of quality as a variable. Statistical theory, including tests 
for randomness, must be taken into account in writing quality specifications if 
the consumer is to be protected against the vagaries of sampling and the pro- 
ducer safeguarded from the incurring of penalties of an unjust chance. 

II. Sampling Surveys. The importance of conducting sampling surveys 
in accordance with the principles of representative sampling is well established. 
It is quite possible that such surveys and partial censuses will be needed in 
connection with the National Defense Program in order to determine the 
frequency and location of individuals possessing special traits, e.g. persons 
capable of withstanding the rigours of dive bombing, or persons possessing 
types of color blindness which render them valuable as observers who can 
detect camouflage, etc. The ‘‘problem of sizes’^ connected with Stores and 
Supplies — see below — may require careful preliminary surveys. Also, surveys 
may be needed to evaluate the effects of various types of propaganda. 

III. Experimentation of Various Kinds. The mathematical statistician 
can be of service in connection with experimentation of various kinds under- 
taken as a part of the National Defense Program since the following aspects 
of experimentation are of a mathematical statistical nature: 

(1) Randomization. Since statistical tests for the existence of differences 
between samples, of correlation, etc. are strictly valid only for random samples, 
the operation of randomization is of paramount importance in “the comparison 
of new designs, new materials or alloys, study of contact phenomena under 
different conditions, corrosion of materials under different atmospheric con- 
ditions, and field trial of equipment, to mention only a few.^^ If randomization 
is not undertaken, observed differences between designs, for instance, may have 
arisen from non-random assignable differences in the material presented. Fur- 
thermore, the validity of tests for significant differences between the effects 
of various designs rests upon the condition that the variability observed in 
the effects of each design be of random character and free from trends and 
non-random shifts in magnitude — ^i.e. the operation of determining the effects 
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of each design must be in a state of statistical control, to use a phrase employed 
in quality control. 

(2) Experimental design. Without careful attention to the lay-out of an 
experiment, the data it yields may be difficult and even impossible to interpret. 
Therefore, the principles of experimental design set forth by R. A. Fisher and 
his followers are of great importance, as are also the special experimental ar- 
rangements which have been devised to cope with many of the more usual 
difficulties met in practice. 

IV. Personnel Selection. The allocation of individuals to places where 
they can be of greatest value in the National Defense Program will undoubt- 
edly require tests for mental and physical traits. Although the development 
and analysis of such tests is largely in the hands of psychometric groups, the 
use of methods of multivariate statistical analysis in such work renders this 
held one in which mathematical statistics ought to play an important role. 


It is in the above four fields that there is special need for the training and 
endowments of the mathematical statistician. He can also render valuable 
assistance in the following fields: 

V. Stores and Stq^lies. 

(1) Problem of sizes. Preliminary surveys are likely to prove useful in 
ascertaining the relative frequencies of demand for the respective sizes of cloth- 
ing, etc. in different parts of the country. 

(2) Development of procedures for charting the day to day location and move- 
ment of stores and supplies. 

(3) Problem of replacement of parts and equipment. In miany it is more eco- 
nomical to make replacement at statistically determined times, than to wait 
for complete failure. 

VI. Transportation and Communication. Probability theory has shown 
its usefulness in peace time in handling “traffic” problems that arise in telephone 
and telegraph communication, electric power ffistribution, etc. No doubt it 
will find corresponding application to problems in these fields arising out of the 
National Defense Program. 

VII. Gunnery and Bombing. Although there is a need in connection with 
artillery fire for further development of methods of estimating standard devia- 
tions from successive differences in order to minimize the biases arising from 
slowly changing conditions during the period of firing, the principles of artillery 
fire are quite firmly established and the relatively new science of bombing is 
likely to present greater opportunities for the application of the methods of 
mathematical statistics. For instance, in evaluating bombing techniques 
there is need of statistical methods in separating the constant biases from the 
random variability. 
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VIII. Meteorology. The extent to which statistical methods are being 
employed in meteorology can be seen from an examination of the Monthly 
Weather Review Supplement No. 39, issued April 1940, and entitled ‘‘Reports 
on Critical Studies of Methods of Long-Range Weather Forecasting.^’ There 
seems to bo excellent opportunity here for the application of methods of multi- 
variate analysis and for the development and uses of methods applicable to 
serially correlated data. Such work would be of value in National Defense 
so far as it would enable the forecasting of conditions suitable for launching an 
attack. 

IX. Medicine. The National Defense Program will probably require the pre- 
paration and storage of hormone substances, toxic compounds, drugs, and other 
medicinal supplies. Since many such are examined for potency, toxicity, etc. 
by means of animal assays, there will be considerable opportunity here for 
the sound application of mathematical statistics in planning and interpreting 
these bioassays. 

In nearly all of the above activities the application of mathematical statistics 
is likely to encounter two major difficulties: 

(1) Obtaining an adequate trial of the methods of mathematical statistics. 

(2) Supplying persons to occupy key positions in the application of mathe- 
matical statistics in a given field — ^persons competent in mathematical statis- 
tics and who possess a sound background in the field of application. 

In some of the above activities, e.g. Quality Control, there will be the further 
difficulty of 

(3) Supplying the vast number of slightly trained workers who will gather 
the data and perform the analyses. 

It is with these difficulties in mind that the Committee recommends that the 
Institute 

(1) Prepare a register of Institute members, stating for each member his 
background, interests, and experience so far as these relate to mathematical 
statistics and its applications;* 

(2) Appoint a committee to handle inquiries concerning personnel qualified 
to deal with particular projects; 

(3) Cooperate to the fullest extent in matters pertaining to quality control 
and specification with the Joint Committee for the Development of Statistical 
Applications in Engineering and Manufacturing ^ of which the Institute is a 
sponsor.® 

* The preparation of this register should be coordinated with any similar undertaking 
sponsored by the National Roster of Scientific and Specialized Personnel^ National Re- 
sources Planning Board, Executive Office of the President, Washington, D. C. 

* We suggest the following as possible undertakings in a cooperative program with the 
Joint Committee: 

(1) Requesting statements regarding the potential contribution to National Defense 
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(4) Undertake such steps as are feasible which will lead to cooperation with 
other organizations having interests similar to those of the Institute, e.g. the 
American Statistical Association, the Psychometric Society, and the Econo- 
metric Society. 

(5) Establish contact with the National Defense Research Committee headed 
by Dr. Vannemar Bush and coordinate the Institute's activities with those 
of this national Committee. 


In conclusion, we feel that as an organized group the Institute's primary 
function in relation to the National Defense Program should be to serve as a 
reservoir of specialists, experienced in the use of the methods of mathematical 
statistics, who can direct the use of these methods and be of assistance in the 
development of new techniques as needed. As a secondary, but equally im- 
portant function, the Institute is in a position to supervise, and perhaps to 
undertake through the activities of its individual members, the training in 
mathematical statistics of the individuals who will be needed in the application 
of whatever statistical programs of the type noted above are undertaken in 
connection with the National Defense Program. It is recommended^ therefore^ 
that the Institute's interest in the above activities^ and its willingness to be called 
upoUy be adequately publicizedy possibly by sending copies of this report to various 
members of the Government, such as the Chief Signal Officer and the Coordina- 


of statistical methods in quality control and specification from men prominent in industry 
who are familiar with recent developments in quality control. Such individuals would 
be asked to give, where possible, concrete evidence of the value of such methods in their 
experience— evidence which would be helpful in securing authoritative acceptance of 
statistical methods in quality control and specification. 

(2) The organization of a syllabus on statistical methods for use in evening courses 
at various industrial centers. (Captain Simon of our Committee is preparing *‘An En- 
gineer’s Manual of Statistical Methods” which will be issued shortly.) 

(3) The preparation of a list of topics for inclusion in university courses. 

(4) The preparation of a list of suggested reading on statistical methods in quality 
control and specification, arranged under such headings as “expository,” “methodology,” 
etc. 

(5) The arrangement of local meetings and round table discussions at some of the uni- 
versities in a few large industrial centers. Some well known leader of the locality might 
serve as chairman. To such a meeting would be invited those men in local industries who 
were interested in the possibility of applying statistical methods to their problems, and 
the meeting could be thrown open to discussion after a brief paper outlining the accom- 
plishments of statistical methods of quality control in the speaker’s experience and stating 
the advantages to be gained by employing such methods in the mass production of the 
War Preparedness Program. 

(6) Sponsor the preparation of popular expository articles on quality control for in- 
dustrial journals. Readers Digest, Scientific American, etc., and other activities designed 
to popularize the subject and gain authoritative acceptance of statistical methods of 
quality control. 
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tor of National Defense Purchases and also to the secretaries of appropriate 
organizations, such as the American Standards Association, with the request 
that they advise the Institute of any specific action they feel the Institute 
should take. 

A. T. Craig 

E. G. Olds 

L. E. Simon 

R. E. Warbham 

C. Eisenhart, Chairman. 
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